From patchwork Sun Mar 19 00:14:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180128 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41E01C7618A for ; Sun, 19 Mar 2023 00:16:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 442DB900003; Sat, 18 Mar 2023 20:16:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CB97900002; Sat, 18 Mar 2023 20:16:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24594900003; Sat, 18 Mar 2023 20:16:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0EE47900002 for ; Sat, 18 Mar 2023 20:16:00 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C933A120C71 for ; Sun, 19 Mar 2023 00:15:59 +0000 (UTC) X-FDA: 80583730038.23.4CBB0A3 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf07.hostedemail.com (Postfix) with ESMTP id A533240013 for ; Sun, 19 Mar 2023 00:15:57 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GubkTHeS; spf=pass (imf07.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184958; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=zRxHGq/x4fwbwwr5Ekrd9FyaMCugduAsObDIjvFXmzA=; b=i87yAcgrZZbPl/bJJlClftftFQXlJp6TXeyPRO+fKR4SAeLjulyKHfo8KheXReetM3ETpH QzZLlWDit31DFQNolv4oA6JA/IEUc0FBLZgy3QWNrSHRTkhBsxYyC/bpbRPNuA8bncVpBP vcK88qhb6PjBrFzEwfmoJ9JC6rEXkQA= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GubkTHeS; spf=pass (imf07.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184958; a=rsa-sha256; cv=none; b=3FZik2Yw2qn9s9LAnzdsGFNV/nfkhvOKlZDhb1FxMwdU82bKQa0prJT0Ih4YbVp+tM0tTT p+j0qGXFGKcDAVhkhmK1iuIqLSeyAv3aot43Lmro0GXQYHqjQyaz1EUwbKouK0WOD4XUIE 0pCrsocj1cK2HbouiW15hWZf97nknlo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184957; x=1710720957; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=0t92Iyq66UV7JVrVlXJ2Eu9M0Ep1BbamDrvvLE4ZT7Y=; b=GubkTHeScrUhwfbViH/zm3mLYHaCsg+wFARcj5uv2jyeKwWxFOkdrVPN k515uQcels/JkQH7l3jQLXyPmxSeYjv92XhvXuKmhASjVA2aTn8XTqTWs l+Fxktuwg7McKXg2ONnnVNZ5UpIztK2YSmgod+J7HvL2dOSF9jG1SD03y GN44D6wnXUkqlqWUfnQfx2VcsWltkJeh2IjEn495JCpJfussNj1XRKI/R hiN+Bhf6m6oI2S2Y9BJt1zJgZMhBDEcCUm1A2uDawVWWkViA8jukhz3Ur yJQt8rfxCCgdeNtUpwGgLFgUmbT+ZTSQmsRWZm+CjRvQ8bW0iHMUG3w5c g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490746" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490746" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:15:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672767" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672767" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:15:54 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 01/40] Documentation/x86: Add CET shadow stack description Date: Sat, 18 Mar 2023 17:14:56 -0700 Message-Id: <20230319001535.23210-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A533240013 X-Stat-Signature: hz91waqazr1bkdapr4tb3asj14pune4g X-Rspam-User: X-HE-Tag: 1679184957-131326 X-HE-Meta: U2FsdGVkX1/A7rVhtPoBtPDxVEKKSK+KXYJi80DrXzyOIhS3XQ5zmjnkNrO6DuxiLVTHT5Bcqhz41WDGt+cMUzIvbqrT5oDGhdy16OS+5F3bOcjbvMueyzDTq5lGNZVu11kOj1uSExQ5CCjcIHJUmjSX0knGcT1AYLnqxaw4K8HY9a07egnfeDERDdAxtfK0Ompz5P0EWUvWHBMpNkmD4qYV+tLt3wiUWsFney6KIxHZcpmWEkNLGCIz4XDgk+iMmnZIAP6qDq4E7ODnPEbxzgSXQa0DtIIy3oyTv6rU1hb48WAcCXkgOA3X3SfXBWKehOb9EkVWKBC0kNq72lsAkGqGaJ4hVup+NoWntrDl9aUOYfS9I0khlY7mJu2yz+qfg8q0eNI3cCHjtK/YC7ye/QeaCZJwtduCWxBV2npIzg1qKwmHtMuogmMLzr1L1ekangmbTGalrJh4ac9E9ej7hUHse+dz6+MQkXhUmOutdBIUo/tYT3aQE/RutB0YPIOKkB1z76K5MZDxGwLFGZsP/XY2OAb99Tb6riAKXBK4GtdQgWIYtovyWlYzPqtAcFIvKmXV3hBSkenszr7z8WDj6PaDX4dUl/k+6nBQKUJ7f1eZvPCoZvwJxbMa5aiKZO11g9B4DSVkeTL/XlgTIBzwz+dgP49UZfXYmrR5LHytymNNmyC5TlGU+gLHpkQ9pF4CvlI4XsT5S+OLQI4F56RpK9cwxEcuyinZWyeZVvF8UdF+0gcEBWplzQwaL/GH6hV37kuNr3X1wdRpwb5dK5aYUY94YmnExAAV26zb8qrbDhI5BP3sj4PikPXRILgppR124jzSzvqeJ6Q8FtRo0XYa2yGyquSnyLpvjOSV3lyKL6QXZpu+8EVnIkdefHRd/F6EbN9qsBZN8mmwjrXGFXffooTbECLwM/lK+kO0DZomvLoTvTW6QFzgsSapIoznS5nfF8d/wNhEXnA8bz2Ik6G cL8EAnRT 7JFi39oL5ibI/rzVSzcW0tavnAOHZ8i/IH/k9e8tpcvGptgPf/2KeCvjP7phDVtVFkbsrsTdPsQrrjkfOogH/5ihzKMJSCL0quYctrNOZ0V4f0FKuVWqIvi3XSdVIphwXejhqVtsw9sg8+xApqk7c6ZAJ/XnexIlxnfWmc4wpvvNr34EGk9b9kRKWDS2KRS2LmPUZRW+WK1gOGVY46R7ObQWJgMAACMlxqH7mp9xBIdNruAYC9H6inup2aWsiUOdEePIsS1/Zb1fDjinfgZTEoSfuEyiPZpe7yuBe33GAMMbhUZrdXeNeO/ctHRk8hrC4aPWgM9Hk8o3RgbQlCXQ9SqNk+stcxxnIouGD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce a new document on Control-flow Enforcement Technology (CET). Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Add more details to docs (Szabolcs Nagy) - More doc tweaks (Dave Hansen) v5: - Literal format tweaks (Bagas Sanjaya) - Update EOPNOTSUPP text due to unification after comment from (Kees) - Update 32 bit signal support with new behavior - Remove capitalization on shadow stack (Boris) - Fix typo v4: - Drop clearcpuid piece (Boris) - Add some info about 32 bit v3: - Clarify kernel IBT is supported by the kernel. (Kees, Andrew Cooper) - Clarify which arch_prctl's can take multiple bits. (Kees) - Describe ASLR characteristics of thread shadow stacks. (Kees) - Add exec section. (Andrew Cooper) - Fix some capitalization (Bagas Sanjaya) - Update new location of enablement status proc. - Add info about new user_shstk software capability. - Add more info about what the kernel pushes to the shadow stack on signal. --- Documentation/x86/index.rst | 1 + Documentation/x86/shstk.rst | 169 ++++++++++++++++++++++++++++++++++++ 2 files changed, 170 insertions(+) create mode 100644 Documentation/x86/shstk.rst diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index c73d133fd37c..8ac64d7de4dc 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -22,6 +22,7 @@ x86-specific Documentation mtrr pat intel-hfi + shstk iommu intel_txt amd-memory-encryption diff --git a/Documentation/x86/shstk.rst b/Documentation/x86/shstk.rst new file mode 100644 index 000000000000..f09afa504ec0 --- /dev/null +++ b/Documentation/x86/shstk.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================================== +Control-flow Enforcement Technology (CET) Shadow Stack +====================================================== + +CET Background +============== + +Control-flow Enforcement Technology (CET) covers several related x86 processor +features that provide protection against control flow hijacking attacks. CET +can protect both applications and the kernel. + +CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack +is a secondary stack allocated from memory which cannot be directly modified by +applications. When executing a CALL instruction, the processor pushes the +return address to both the normal stack and the shadow stack. Upon +function return, the processor pops the shadow stack copy and compares it +to the normal stack copy. If the two differ, the processor raises a +control-protection fault. IBT verifies indirect CALL/JMP targets are intended +as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow +Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace +shadow stack and kernel IBT are supported. + +Requirements to use Shadow Stack +================================ + +To use userspace shadow stack you need HW that supports it, a kernel +configured with it and userspace libraries compiled with it. + +The kernel Kconfig option is X86_USER_SHADOW_STACK. When compiled in, shadow +stacks can be disabled at runtime with the kernel parameter: nousershstk. + +To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later +are required. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. "user_shstk" means that userspace shadow stack is supported on the current +kernel and HW. + +Application Enabling +==================== + +An application's CET capability is marked in its ELF note and can be verified +from readelf/llvm-readelf output:: + + readelf -n | grep -a SHSTK + properties: x86 feature: SHSTK + +The kernel does not process these applications markers directly. Applications +or loaders must enable CET features using the interface described in section 4. +Typically this would be done in dynamic loader or static runtime objects, as is +the case in GLIBC. + +Enabling arch_prctl()'s +======================= + +Elf features should be enabled by the loader using the below arch_prctl's. They +are only supported in 64 bit user applications. These operate on the features +on a per-thread basis. The enablement status is inherited on clone, so if the +feature is enabled on the first thread, it will propagate to all the thread's +in an app. + +arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature) + Enable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature) + Disable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) + Lock in features at their current enabled or disabled status. 'features' + is a mask of all features to lock. All bits set are processed, unset bits + are ignored. The mask is ORed with the existing value. So any feature bits + set here cannot be enabled or disabled afterwards. + +The return values are as follows. On success, return 0. On error, errno can +be:: + + -EPERM if any of the passed feature are locked. + -ENOTSUPP if the feature is not supported by the hardware or + kernel. + -EINVAL arguments (non existing feature, etc) + +The feature's bits supported are:: + + ARCH_SHSTK_SHSTK - Shadow stack + ARCH_SHSTK_WRSS - WRSS + +Currently shadow stack and WRSS are supported via this interface. WRSS +can only be enabled with shadow stack, and is automatically disabled +if shadow stack is disabled. + +Proc Status +=========== +To check if an application is actually running with shadow stack, the +user can read the /proc/$PID/status. It will report "wrss" or "shstk" +depending on what is enabled. The lines look like this:: + + x86_Thread_features: shstk wrss + x86_Thread_features_locked: shstk wrss + +Implementation of the Shadow Stack +================================== + +Shadow Stack Size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. In the case +of the clone3 syscall, there is a stack size passed in and shadow stack +uses this instead of the rlimit. + +Signal +------ + +The main program and its signal handlers use the same shadow stack. Because +the shadow stack stores only return addresses, a large shadow stack covers +the condition that both the program stack and the signal alternate stack run +out. + +When a signal happens, the old pre-signal state is pushed on the stack. When +shadow stack is enabled, the shadow stack specific state is pushed onto the +shadow stack. Today this is only the old SSP (shadow stack pointer), pushed +in a special format with bit 63 set. On sigreturn this old SSP token is +verified and restored by the kernel. The kernel will also push the normal +restorer address to the shadow stack to help userspace avoid a shadow stack +violation on the sigreturn path that goes through the restorer. + +So the shadow stack signal frame format is as follows:: + + |1...old SSP| - Pointer to old pre-signal ssp in sigframe token format + (bit 63 set to 1) + | ...| - Other state may be added in the future + + +32 bit ABI signals are not supported in shadow stack processes. Linux prevents +32 bit execution while shadow stack is enabled by the allocating shadow stacks +outside of the 32 bit address space. When execution enters 32 bit mode, either +via far call or returning to userspace, a #GP is generated by the hardware +which, will be delivered to the process as a segfault. When transitioning to +userspace the register's state will be as if the userspace ip being returned to +caused the segfault. + +Fork +---- + +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. New shadow stack creation behaves like mmap() with respect +to ASLR behavior. Similarly, on thread exit the thread's shadow stack is +disabled. + +Exec +---- + +On exec, shadow stack features are disabled by the kernel. At which point, +userspace can choose to re-enable, or lock them. From patchwork Sun Mar 19 00:14:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D17AC77B60 for ; Sun, 19 Mar 2023 00:16:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 448B9900005; Sat, 18 Mar 2023 20:16:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CC6C900004; Sat, 18 Mar 2023 20:16:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24668900005; Sat, 18 Mar 2023 20:16:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0B0B0900004 for ; Sat, 18 Mar 2023 20:16:02 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BFE7F40C82 for ; Sun, 19 Mar 2023 00:16:01 +0000 (UTC) X-FDA: 80583730122.26.C43E14C Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf03.hostedemail.com (Postfix) with ESMTP id C499A20008 for ; Sun, 19 Mar 2023 00:15:59 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YqCCtIys; spf=pass (imf03.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184960; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=VkrxRxNRr4BKJw2j76aUuVqNLhkKJerjZq1QQHk+4g4=; b=YAM+tIO5RBU6yz0BpE+e4NWfZTUaILnL9XiunCfZ71nprfDCi31kSmVlbBRQ8bSlZTk6Xy jZKLwu874TqR441jCqJUaHWIxfG2LhM+FV94Wlfet/9IqimzjjXgPN46NNATsX3RbPecok oE/D5GsPqCD/cKcZWnPkDVJubgRu+2Y= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YqCCtIys; spf=pass (imf03.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184960; a=rsa-sha256; cv=none; b=QXt8pbiIpfAmjf1qdunFqQbaw9IaIZ1xjEFUMR3swwcLUOCteBA+TjQas97m15Buz67Yrd XarwY1P5X8ccphmsMzPWmGD+G9j6/+iLBKrsr+LmBlIV6md/BL2grlGroxzJZ6pT5IFZqr URLbOuJER5U/KF3Vh3nMh4bCEsewG5U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184959; x=1710720959; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=r28jIOtA/K6D8osNwQJhDgDdLXhK5+vr3NOhbtj47RE=; b=YqCCtIysJapZ4ZpeDk7CC52G2FAeRwfCK5KgddlQpnTm6gBUWSj1m0VH vk93XFs/pHkUR5TbSdRuz5vlmqjlG+/vJUonB0EmEw+jEm/BIlnIeKkIO y96lq6vHilfUhtE0ynalFfHZHJBNIfU3sHnSIqpaV1WFcwAUzZ4HJl8vD cXnkeyPU9exv3Pygy+A4rzw3F1hCaZjBHZROjxM3OJcAVmsXwcYDTkxeA VlbuxpHnIl3+T3A+3oUA5zDZ/tu0+55gbJGC1igCzfWBL9591fQaJdS0T PHPRi/8lMBQcnjx1WZzRSzRRZX9qTcLFVxMBP/oB3bh0Q4tqE0tjWDuXg Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490768" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490768" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:15:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672771" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672771" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:15:56 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 02/40] x86/shstk: Add Kconfig option for shadow stack Date: Sat, 18 Mar 2023 17:14:57 -0700 Message-Id: <20230319001535.23210-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Queue-Id: C499A20008 X-Stat-Signature: o7ofam41pg4okuipi33nq7uchxjjjmee X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1679184959-957588 X-HE-Meta: U2FsdGVkX19zcPoeLOsgASm5JCGTza9XUH/iWV+Bpr6zuna2tEpmFPVqQrh8KVnrbWnboDdaeoEtI4vbD5fIgAOE5BWNleO4B5uKZHDXrcVUqYaJdubI+rOC0xBYT7U7mPT1/TY5hb7+Kc1gFAflK7rYdoJCSi5NW1NJkU/LCBjBivjeacyreZZEHVxMq/Jsf2rFo6YNW29qkTFEpzCUC9gv5f85pXYX7JySz1l/MhUS+1xsx8uQKJtXH1ZQAs14kEN5tRJ4b+QNupwv3HLVOyOsCbJr0Ngu+iIl5ujjLJFYMV3PxwTsX6cY6S4pINlCTi0y2a7qCmlMngSVt+SlM70ovQrK1xJZFJImiOo9Eo4FtiHYjkjl0vrB9Y3dUuGGsxrXfENfng3SbyOvw/Sup0ARTWCTaQ2xtwE/cGTQCbplSg18dBtwjI8+ELMYBqCg84jDr9q7ewYRySL4Ch6ctCkiNJN+9rqoswdquLDYZci5VRAtQWqPB0fpqd4nIr8NxTE5FOgrnt57zeCPGZ7sf0EJ27oHCTshQ/AEhOgOuAJug0Ho/pquVcAurXIMkfsZyuk/Wet5sbmDJbD7j//J48NmMGZYsLiS3Ds0cv1oOBAgOyNmC9gh/EW03KLy4pfeAJvGPllwrBNWPh0chy8tHpLPJ9eiV+Nfft2ek3uoZNdb4BmOtA11Qjc6xxk1WjnFH50k4NnZ+Cf2/rnFsWHrsBSulKZO9SNSCxDlNaFYtklRSA++cn2jCGFGJvEEzwSCvaQs64Wn5lxnTy8DqAIt+tvLNJX9IJ50LjZgBpldcgJuvAEiJ92UpzC2t6zdXma/DWyM9w/2VT2JwmL9fCCo4zdMioiTEIML+tyQm6ic6+hVET3N/6Ovp7L9u+MCTqMzgKE+cwATQz7ntooG1oYhbJy+1qonV0HkL0ybizCWgrj6aIAP1JB0vJr7zMwG+H7kPzS0XdzpbJ9AEOdW2Yj +HVqyMRU 4UI0Uo9sbftMpEnnLY0zmx4Wz+cACJ0bRRfS6qCfbc6Q/9sNy74kz1BZb0boR2MharWA5391yK+t4wBp+mCzCx5UrDEV9N66T5PVWK88X+YkabOd2e4VBFbtTPFYGHLmAmKGPRbgKLQarP3xOIFRv8lykJ5ZdNUOZ7XP8FtR2iEBHdgbLe/iLbOkC3LrFg6Mzc7odNh4NIcduK2Z+ZsOnjKW9SxPZLeWRzy1RcYRmQFN2iWZKmr28a8K9CnKGxSmROOjoHuf1pg/Qi5+qzHa4j41szsqNEuGuoaYbvHjvYYdDchsK2TU1lWaCNSuDCdNiSZOraltwNflg1XWucYUdwunrUITHlFdhXLKXXjaAWylqfwVc/Yh/dTqR1g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack provides protection for applications against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-shadow stack applications continue to work, but without protection. Since there is another feature that utilizes CET (Kernel IBT) that will share implementation with shadow stacks, create CONFIG_CET to signify that at least one CET feature is configured. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v5: - Remove capitalization of shadow stack (Boris) v3: - Add X86_CET (Kees) - Add back WRUSS dependency (Kees) - Fix verbiage (Dave) - Change from promt to bool (Kirill) - Add more to commit log v2: - Remove already wrong kernel size increase info (tlgx) - Change prompt to remove "Intel" (tglx) - Update line about what CPUs are supported (Dave) Yu-cheng v25: - Remove X86_CET and use X86_SHADOW_STACK directly. --- arch/x86/Kconfig | 24 ++++++++++++++++++++++++ arch/x86/Kconfig.assembler | 5 +++++ 2 files changed, 29 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a825bf031f49..f03791b73f9f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1851,6 +1851,11 @@ config CC_HAS_IBT (CC_IS_CLANG && CLANG_VERSION >= 140000)) && \ $(as-instr,endbr64) +config X86_CET + def_bool n + help + CET features configured (Shadow stack or IBT) + config X86_KERNEL_IBT prompt "Indirect Branch Tracking" def_bool y @@ -1858,6 +1863,7 @@ config X86_KERNEL_IBT # https://github.com/llvm/llvm-project/commit/9d7001eba9c4cb311e03cd8cdc231f9e579f2d0f depends on !LD_IS_LLD || LLD_VERSION >= 140000 select OBJTOOL + select X86_CET help Build the kernel with support for Indirect Branch Tracking, a hardware support course-grain forward-edge Control Flow Integrity @@ -1952,6 +1958,24 @@ config X86_SGX If unsure, say N. +config X86_USER_SHADOW_STACK + bool "X86 userspace shadow stack" + depends on AS_WRUSS + depends on X86_64 + select ARCH_USES_HIGH_VMA_FLAGS + select X86_CET + help + Shadow stack protection is a hardware feature that detects function + return address corruption. This helps mitigate ROP attacks. + Applications must be enabled to use it, and old userspace does not + get protection "for free". + + CPUs supporting shadow stacks were first released in 2020. + + See Documentation/x86/shstk.rst for more information. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler index b88f784cb02e..8ad41da301e5 100644 --- a/arch/x86/Kconfig.assembler +++ b/arch/x86/Kconfig.assembler @@ -24,3 +24,8 @@ config AS_GFNI def_bool $(as-instr,vgf2p8mulb %xmm0$(comma)%xmm1$(comma)%xmm2) help Supported by binutils >= 2.30 and LLVM integrated assembler + +config AS_WRUSS + def_bool $(as-instr,wrussq %rax$(comma)(%rbx)) + help + Supported by binutils >= 2.31 and LLVM integrated assembler From patchwork Sun Mar 19 00:14:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0B6CC7619A for ; Sun, 19 Mar 2023 00:16:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6DB4A900004; Sat, 18 Mar 2023 20:16:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 63E21280001; Sat, 18 Mar 2023 20:16:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46A2B900006; Sat, 18 Mar 2023 20:16:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2BA89900004 for ; Sat, 18 Mar 2023 20:16:03 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D3A57120C75 for ; Sun, 19 Mar 2023 00:16:02 +0000 (UTC) X-FDA: 80583730164.16.1458D88 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf30.hostedemail.com (Postfix) with ESMTP id D291E80015 for ; Sun, 19 Mar 2023 00:16:00 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Pyt8CGtH; spf=pass (imf30.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184961; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=Fu0JiLzYUtOtJag4aNX/shhq8JJ4z7u8FfcAdekpma4=; b=mx8dlSCX6FISSSr4gLzZINvaERaeeXEWxvEO9/KI3Qa+W8J5T56IZzGs6nf+Ls8wA/R8Lv P0so0J2nbW31O105P40SnHR8ymlqX4tHQ6z1vt03PLv172LuGyjjXsh6YZOTAhblR2NpC/ JURhxgl2ZGXJ2sNr1AtYrsj2hQb1tOY= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Pyt8CGtH; spf=pass (imf30.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184961; a=rsa-sha256; cv=none; b=TuCeJpuXXYAM1WvG0LJ4cHoen10NFKMTqjj4tYO8zl9GgWOhbVNo1hyQMC+Gje6PCwX4Xk a127YQFpM1hcGHCwPpLObKcQNrj6mLMFMICtRXhNiHI6rL7PKCmNuMKvKPqv0JAG9eRjFt 04yK92jFoAVjGjuZiGy9Am2PvEWjZ6I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184960; x=1710720960; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=DZhiQrXgICtOhDgJWbiC3iTfdlJMOxEjzVm3FJAu0Gg=; b=Pyt8CGtHtjREDDCOFQvRhAPAmUwcA1EAGsMBa1Xdj1pGJDykgj5aJbzL HWcc223OAX4d0HwhozfY6Lvjw6DgW8bliFBc+l+NEEGJXafJtOYeFBDrg 3kK7SdTFi05mU09M1Uye233g4yben1r+tPRP4N9Apntuft1ZXiAkQJBca FKG2bD+Sl42lnoHLQuhrqh1wgmih0QGGQNIYYk0bgYg0lt0gxXb/UNRpC ZelN0uVC/MjYr3jpaUPGRG0yBNpbl/UuuZGrjAFkymo00qssBPYzi9mOH XqpGIzgEWzJ8dfOuYD+4QFgM4aFUBi8cjxpK6d59a6RWdU9LzW6O5Sssb g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490790" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490790" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:15:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672776" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672776" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:15:57 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 03/40] x86/cpufeatures: Add CPU feature flags for shadow stacks Date: Sat, 18 Mar 2023 17:14:58 -0700 Message-Id: <20230319001535.23210-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Queue-Id: D291E80015 X-Stat-Signature: ipis58znfrf8je5zwicnzoxber8xoeiu X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1679184960-733504 X-HE-Meta: U2FsdGVkX1+CAXLiE94MZFZEadryF3ipNHhl8I9y1jFG6K1T4rN6D5r+VfIHLmZ1V+U1ZjK4qFq4mLAjkBxdi7+/KJKHXRQHDv3Ezv0aB9Ta/B2JApLUYeLsuzcoY0pEqiRTvLS8Npacue3jyYdfNUtgrntkmwLC6v2nyGozXpArHBM0CMxDx04ZCUZhmJWvLGnnXlc/gfFDgC90Zyo1o/pe6wtkbEeWj7TVxfowJ1CnLc1Pwn9GX7jiRk2QJSd1ObL3iPrDW0BauUPU2fYEgmskxpcl8CjBc7HKNkXHFcur/ji2u+4nzqXVvMQPnVvdHoQ8Rk7SiAkyEsDjZyU1f8PM4MpCEfL3yxrFhy/XL1zAcLHkn/ijYXvTAfrVRFggMwyklBZX5X+1lcmrBTBEuMTasMCkQVnU/RcoG3RYHTNOsGNbojO7Xn8IMvRYU0DzRgKhFexB1l+I3GlcIH5HA6rkrnZp30epV5d9nALdcxm8/PvwRFpBhA2uzCoL5Dtk6PPXg6iej+ZPySed224dDWsRy6Nj524KWX0sBNgXkX+tzDBmky4Mqm3MXxQFZ5t8iKSczFFea+7cq+Rnv0VgeFZgVSfEMHZ83LNzrLeGXnFTYMgR9/84OK4omuYbkdPBSTWHITH3Z01N+RnMVJhULqVoLVp5vb5bJ97aDEkGzmZgHEFdYgLRGpQiiYE4DTX0Jm9UmbIwIuTO/Fa8q5HBudAeQlIpans5qzjEdDCgYilvIv40cJAirngYvXrqZI+fDH/L+uqDwGaJ1Z5ltDhYjUioV0rJs5gZWZp+uAuYwrN65na+lRwmev7mDjc96D8Myh1uS47QTOxAWTIHayG8rf18K7qdEBVkEQU0vFff+vbA82a+/h8BQJ1DPkQBVpsQTiDsoRKyfBxmrrDDhxjaHZAg71dnqjkisFj9rfaU/dD2+CHisKi+bJbqTGn5OnJiJ5g3wJTak/yh0P14AYy FUT+Swow XpH5mM+9aiXq4las1tnyPIjn9rSG8+rloYLk350L5faHmykO1zYm5CMGjsxYjkDxrBssY7Vqp6ByvplmoOfPxa9ZaRiccCeCFsjqzpFe/LX4baMr1/3GgA6OiF5y0G/HYdvIgKUz0jK0nGF4C6w//BYO6ixDNALdJo2mWrwN4TfTGfJnp83MNQazCb2t0VjOtXdJ2XzIQ6x2j0AReLlhi7qKOuzEjAT7pvQZi+1PKECpGv2NzQ0BAupaH11B1I8Epz0aN4UkianXgWlBXbgZ9f1jFxOWsTiClrEmrWplEMjC2ujIu22SEwgbU7KaTE9OSlL/Ga819uAndOcNM0EuaWor0uAq5JITEwl6o X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The Control-Flow Enforcement Technology contains two related features, one of which is Shadow Stacks. Future patches will utilize this feature for shadow stack support in KVM, so add a CPU feature flags for Shadow Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]). To protect shadow stack state from malicious modification, the registers are only accessible in supervisor mode. This implementation context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend on XSAVES. The shadow stack feature, enumerated by the CPUID bit described above, encompasses both supervisor and userspace support for shadow stack. In near future patches, only userspace shadow stack will be enabled. In expectation of future supervisor shadow stack support, create a software CPU capability to enumerate kernel utilization of userspace shadow stack support. This user shadow stack bit should depend on the HW "shstk" capability and that logic will be implemented in future patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v5: - Drop "shstk" from cpuinfo (Boris) - Remove capitalization on shadow stack (Boris) v3: - Add user specific shadow stack cpu cap (Andrew Cooper) - Drop reviewed-bys from Boris and Kees due to the above change. v2: - Remove IBT reference in commit log (Kees) - Describe xsaves dependency using text from (Dave) v1: - Remove IBT, can be added in a follow on IBT series. --- arch/x86/include/asm/cpufeatures.h | 2 ++ arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 73c9672c123b..3d98ce9f41fe 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -309,6 +309,7 @@ #define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ #define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */ #define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */ +#define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ @@ -378,6 +379,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* "" Shadow stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 5dfa4fb76f4b..505f78ddca82 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -99,6 +99,12 @@ # define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31)) #endif +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define DISABLE_USER_SHSTK 0 +#else +#define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -114,7 +120,7 @@ #define DISABLED_MASK9 (DISABLE_SGX) #define DISABLED_MASK10 0 #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \ - DISABLE_CALL_DEPTH_TRACKING) + DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK) #define DISABLED_MASK12 0 #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index f6748c8bd647..e462c1d3800a 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, {} }; From patchwork Sun Mar 19 00:14:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180132 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C334C77B6F for ; Sun, 19 Mar 2023 00:16:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41E02280002; Sat, 18 Mar 2023 20:16:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A571280001; Sat, 18 Mar 2023 20:16:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10EC5280002; Sat, 18 Mar 2023 20:16:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E7D2B280001 for ; Sat, 18 Mar 2023 20:16:03 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C580BC0CFF for ; Sun, 19 Mar 2023 00:16:03 +0000 (UTC) X-FDA: 80583730206.09.34067E9 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf05.hostedemail.com (Postfix) with ESMTP id BE7A0100007 for ; Sun, 19 Mar 2023 00:16:01 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KBwzjQ6a; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184962; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=EAELIjQ8DyrYHPzfrIMRE/EUjFCHmLFlbgspe709h8s=; b=AA9egMDL0kF04kI8eD0dYlV7AGVMBVPHe7ioJrqy/ZlFAxo+3a56+RSWllYFnKQ1yi7X9Y k9sSWYYgBilB0Gz1uQcW0fY7orzhEbulMqv2plGuxiXpLrSGYAeDFPJzR2WYwE8rYIlAhe VV9wWhWBQyBeqTyh/k+gQpnoNqh+yak= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KBwzjQ6a; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184962; a=rsa-sha256; cv=none; b=mJCyuB5htsDy7h+XS+0zMwUeNRYmEnvJE+uIZG8ZspmzbigdXTYaiupbwVAf8vDgBFsOfG jafjSKHKi0AofphLLEm+wPxVHHYEysgRANQNqBKvs/XovRJMLFdI+vI+/n1pvp0WYKGzWz nEMkwoCMMkejFh8BpQp+O01UTkLJxf4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184961; x=1710720961; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=K4Y6KAE5IrGSFrjM9/a33nZFp30wbA4tT+ZAd3T/ndY=; b=KBwzjQ6a3mPMkIVALWYxlHJngm8nfIVax0VxeuJhq/rczMOJeTWMAJjS njSZXak8Nwus4OtCoIBLdl3X5PkaSaVqMpa2qAarE1dY+YlKb9uEGwKIX w0uHymqErEcIU1YL7iLwQWDydGEZSWpH3SnyVCyoVURAvFWLUkhUUBZ+R a6Zd1WOW9CIp0bPQXcLJ9+SjNDjn4+5CyRuWA5bH+T1xeH35Qzye7cDCx pKVOrt53C4CQ+6YRnqswghHxGcez27xyiUnvpgJaTO4SuUpoEip5QfHUT 3/jLiIuXy1+00E4HRRIfwoZIi+s17YBz1Zm40gQrjENpR5NuX5EFhijzh A==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490814" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490814" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672779" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672779" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:15:59 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 04/40] x86/cpufeatures: Enable CET CR4 bit for shadow stack Date: Sat, 18 Mar 2023 17:14:59 -0700 Message-Id: <20230319001535.23210-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: BE7A0100007 X-Stat-Signature: demwz4dwnmrutqumhb5q96ogpm47jruf X-Rspam-User: X-HE-Tag: 1679184961-618381 X-HE-Meta: U2FsdGVkX1+6iD0ALZqjZy0ANJEv+EZXoq30JtqeKA2qt61SdrHfJMoHdayjBFmh/pQWEi6QUhbpnXtUosg4JL5bbSqehV28Am78mBgu2txRHXW3BPaZUoRCl6Fsydcj8TiLbTyCqZ0lTdnr6eZbeyZAjpD+lzi5qxVUhjl+gcLqYTzq3QPkWzmBcO2UP3/wwxHJKvbIYQuTr/hFCN48uYGJH8YxAbFayarPf6Nlrnu8l4CpBWKVfMFfOO7SLhMXYF5J05jr/6ct3wVQBDzWPkR1YrxyAbXMksWo43VvATExtHPlJ7R2Ttvlf0VQunpHyPvu/w+47FbBTiMlTC/4sX4DYyNdkGH0nRu4wXPxn6yzWEJNwq5OSFi/0hVNJZzaiV0cpjy1IGmDEYKPsv+0i2Cw+qHYXX46fjV1KVJpTqG7e+X1BP+FXCyekSIV/3oQ85bnLp0T2YVhH4PCrtK8wX6oBTFpyd4Y3CcjSwolx+rcBOqfDlApaQFVu9XFGbpQRkEUZcYqHLNWxhT87hTNyggjTvqIhdysRA8hgjxABNUpN/BjxrtuwWWTNWmTmziLTXboNQr0Fi6ah3R+85zdXHuEieqvzYkNC2eHoVwZH3a0vyjnQyAE8GLWmaw7cCFJ0peEoQZNWEZrwhEqKdFZc/ihzNp/BcKlCjQzMyYax9VeP1FgGUGE+hhq9CXuuoVeWg8IyNu/fiX75L385L9W42AfB3UTbt9Ay2jF5lFZXl2pPedCVKtVEyRrQF1MT+/+GgBmySZPP0D45k9Wij7DK4Fuph/zMsIffOFZ8fyuzxhlG5TBH2XxXJj6i6ZIGhTv4u6Etd0pjlJaLwTg2iHqGpaZrN68ysCp/s+WXe0MUjNHdqZHwYUdXNn05qNWG8BQOI7kDAKcN78D+tjPx8He9eKB+7r00oR/UiBKPSPmJs5ySABX7c/Bct4RPrQu0/DXcSXZTJNfnl+HNOHaU6l 4Sj4bwcI 2cDHHtyIZK46Bb8I9PdwWhweCGedwa/95nrB8TGkZuWLn7k5miEfP4B1bxDfUKppSYuwvL0sfbVruRfWRI2BKgA+mI7JbsRS41W+MEFa1TAfoe/mLgCPLnqTO5zeYwgBgbdbfyNDRY2HtIuFzDYWBz3xotsMQVgDGWZcJmIcdhiQxHnsHDUhACEH9YAz45vdHKKFOXny+zzMviJRbj30y6PDGTOLWieVQel/JV/lsc0Ii8udcMXwl9SOWYlSh4/8bM45AmmJsD/B0Jx5DcO2iJiLEQFr9MHuO295jiLJ6e0DsB2QrkEpLY/FKN21zxYWjJlvg5Nyko9KGPacLnmN5fPSqWv9mLLbLQ7ZY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Setting CR4.CET is a prerequisite for utilizing any CET features, most of which also require setting MSRs. Kernel IBT already enables the CET CR4 bit when it detects IBT HW support and is configured with kernel IBT. However, future patches that enable userspace shadow stack support will need the bit set as well. So change the logic to enable it in either case. Clear MSR_IA32_U_CET in cet_disable() so that it can't live to see userspace in a new kexec-ed kernel that has CR4.CET set from kernel IBT. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v5: - Drop "shstk" from cpuinfo (Boris) - Remove capitalization on shadow stack (Boris) v3: - Add user specific shadow stack cpu cap (Andrew Cooper) - Drop reviewed-bys from Boris and Kees due to the above change. v2: - Remove IBT reference in commit log (Kees) - Describe xsaves dependency using text from (Dave) v1: - Remove IBT, can be added in a follow on IBT series. --- arch/x86/kernel/cpu/common.c | 35 +++++++++++++++++++++++++++-------- 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 8cd4126d8253..cc686e5039be 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -600,27 +600,43 @@ __noendbr void ibt_restore(u64 save) static __always_inline void setup_cet(struct cpuinfo_x86 *c) { - u64 msr = CET_ENDBR_EN; + bool user_shstk, kernel_ibt; - if (!HAS_KERNEL_IBT || - !cpu_feature_enabled(X86_FEATURE_IBT)) + if (!IS_ENABLED(CONFIG_X86_CET)) return; - wrmsrl(MSR_IA32_S_CET, msr); + kernel_ibt = HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT); + user_shstk = cpu_feature_enabled(X86_FEATURE_SHSTK) && + IS_ENABLED(CONFIG_X86_USER_SHADOW_STACK); + + if (!kernel_ibt && !user_shstk) + return; + + if (user_shstk) + set_cpu_cap(c, X86_FEATURE_USER_SHSTK); + + if (kernel_ibt) + wrmsrl(MSR_IA32_S_CET, CET_ENDBR_EN); + else + wrmsrl(MSR_IA32_S_CET, 0); + cr4_set_bits(X86_CR4_CET); - if (!ibt_selftest()) { + if (kernel_ibt && !ibt_selftest()) { pr_err("IBT selftest: Failed!\n"); wrmsrl(MSR_IA32_S_CET, 0); setup_clear_cpu_cap(X86_FEATURE_IBT); - return; } } __noendbr void cet_disable(void) { - if (cpu_feature_enabled(X86_FEATURE_IBT)) - wrmsrl(MSR_IA32_S_CET, 0); + if (!(cpu_feature_enabled(X86_FEATURE_IBT) || + cpu_feature_enabled(X86_FEATURE_SHSTK))) + return; + + wrmsrl(MSR_IA32_S_CET, 0); + wrmsrl(MSR_IA32_U_CET, 0); } /* @@ -1482,6 +1498,9 @@ static void __init cpu_parse_early_param(void) if (cmdline_find_option_bool(boot_command_line, "noxsaves")) setup_clear_cpu_cap(X86_FEATURE_XSAVES); + if (cmdline_find_option_bool(boot_command_line, "nousershstk")) + setup_clear_cpu_cap(X86_FEATURE_USER_SHSTK); + arglen = cmdline_find_option(boot_command_line, "clearcpuid", arg, sizeof(arg)); if (arglen <= 0) return; From patchwork Sun Mar 19 00:15:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 440AAC7618A for ; Sun, 19 Mar 2023 00:16:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F196280003; Sat, 18 Mar 2023 20:16:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57A3A280001; Sat, 18 Mar 2023 20:16:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33242280003; Sat, 18 Mar 2023 20:16:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1AD04280001 for ; Sat, 18 Mar 2023 20:16:06 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E4AED1C6411 for ; Sun, 19 Mar 2023 00:16:05 +0000 (UTC) X-FDA: 80583730290.08.EBCDA4F Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf05.hostedemail.com (Postfix) with ESMTP id DEF83100007 for ; Sun, 19 Mar 2023 00:16:03 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nnCzbCgB; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184964; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=x+DZu64AZlWqg9jI5n7i6mEJSBUcY0ZFP1UxEduBU74=; b=QkevZD2jav3jmOBo1EAZjg+aGmNhhK4bvO70HX+jx+TfMU0AcvqX7fjbAeBWEG2vDPmIWX ioYo3aKtL1opAN74kp9OiTLFmUambaNfJ/iDe8mCDKc59iQAyX9Mc+WoIQH/rPnrw34hac W73ngSEnJQMsxI5lfNistGJWT5p9PKc= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nnCzbCgB; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184964; a=rsa-sha256; cv=none; b=dScH65JovuxbhdAU/Am4OwFZVcnjclDJtBAjYwlL+6+JBDCJIAFtO90pfi0vhlXxIf6qQo 45fbqVqbSYhf3aYk2FUUupn/g7GA4V8pzVYd0XI8CWpOPHzzWbfJT/9t9IEylhScRN2Mu2 TcBOCcoehWGAzpP3cyhtpAC0+wcnL88= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184963; x=1710720963; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=tYOU3V6KmQtESAJgUK7c/60CO62FCAwvULJrBjiqxhw=; b=nnCzbCgB1KMIK0eyp1iVLirkJ9/I4DreV3eIazQKkWz9qjDZdtIhz7TT mofe7kzXnhXuZ7nobPndp61Dsh8TLzz7GSZ1VGQR+yvmal82uIdjG+RR9 1Z8xzSatIHjS5P78epQTSEJJDP7ShGN3PjTT9ctVkH1gziUyXVJWZvu2B aQHKLoQaDfC/N1tB6vMH0DWJ0FqmlasfA8vUSM0Prx45ghyWMgX/BsaW6 NYnqcV8FFjXC7A5Rmh+GXdEKBGb4JHOFvtNmw7XUkeDOa0Ujk1bz2v0Y7 UbYLeqNVdNV2HoNcF90d7LLCzAoOolXCRSAkcqmSvt43sN1Orza/BqHCm Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490836" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490836" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672783" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672783" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:00 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 05/40] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Date: Sat, 18 Mar 2023 17:15:00 -0700 Message-Id: <20230319001535.23210-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DEF83100007 X-Stat-Signature: it75hpkx5p4ux7yybbjncqgwhgnhoen3 X-Rspam-User: X-HE-Tag: 1679184963-347101 X-HE-Meta: U2FsdGVkX18F+ieBtL8bG16ozCyzle+B6ivUHjMeMqc89zNt92YMXG/phxo48rMP1UHzCe+KGvt7DhuVhbjQXIF7RF0NvrSyMpLsOSbS/oSMnk9XDeezrx7l+nGPBoYl+BmwzuwrEJ/YsDfOTi9ZqtUHDA0ZHf0KXLSfxzQOtCu4owReOGUtqiVe1e3t+8A0zHwViqN9XzemBHPooy/IXPBSYb6R/2y0W7W3cxlWyEvB8SXoaGlRo4qEKSuAdqiXcvedb8NUED5jvPBnU8CVnKb6NySEhori7OLL+qUl5tzm1pYQ60oswaM+hB26crId4pPGk3K/ufmL3crOPjyBiQ78uxQCVCTFVWJBdRB9Y/QyPpoENjZWKxb/TlquhBjz2PjlDLp3IfrRSyT7enPzOk+U6EHGs+JufIs7yhHSPiYRpt37REln2eFCXcorXUVlWaj1kRObr9egQi6a0V/Z+bg0qKKaL/k6Cep/bpYoHLufF/IL08de1oRr0DC/w1lALCa8WkfM+o7DXjQZiYH+9jqNlCU1I4vCOfq29rO0kyHQiHhf4S7ewMpxl1qf+JEB9EI2jjZhxXVmKDGPA5KKKdzt/pPjnLR5PSQMt4c4BXTvSpSWGhJvq7eko1estbg8xcO/JSxGEve7IhDq5nBkp7qOpdWQxGOpP2TkTygpB/6KIkrWODzr0s/85BiP4U5oTMSagR3WyCvEcPb+90Nb6Lg4CPrDVFDY8+HryXdNzldxHGj6SOoJ9wgAw7YgOKR9D2kgeTxh/zEtuV4+8JLNuzFRhL86hXtA4aGMqnPLmS3NIlV9ZmWovk9M9uiMOlv6db7E5i8zqn0bCUndU4UrL7I74rs0MJPrypXL0MWo5Yu6ovpLNqJRJ4mG9Q02FayVqYPPlxpZB6i2GMEN82MRkWJoWgUe7E1VYQATxyVMa/uEgpBs2tkOSyE7Up9Dwc5dpkQMwtsXFylQiHsn8o2 Nf+y1dmG A+ViLQNzLPPImb5LJm5QsvusyAWspU4aUqS5zwhIR6OsRL/+Ya1vgwRdYcFpE2H2eqA29tdh/g577U8AXbJLS+MYwsP6drMC9tZVe/zYLJ8eIQeFKHVz+4veGhbcM9wTd43Zsb8TkZbiiFJkY+c2pAfE/tSDIz6swjohykMSI3VtSsIBK9HSoN9yZM6SIFJX9HL2vDzk++qEUwEnh6IZnJtlALsE1ASlUcKD2pg6ib3xpJoRfhQdl3jHrT3sQLXShAPgusJ06uHhYWfJWVOu3K7fpRro2y56Qd+g90/Vp3tCOiq/lNtuWeYFIAQdho4bG7Js3inOodu0iNtHtwF5AF6og/MIY6Lkpjpk6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack register state can be managed with XSAVE. The registers can logically be separated into two groups: * Registers controlling user-mode operation * Registers controlling kernel-mode operation The architecture has two new XSAVE state components: one for each group of those groups of registers. This lets an OS manage them separately if it chooses. Future patches for host userspace and KVM guests will only utilize the user-mode registers, so only configure XSAVE to save user-mode registers. This state will add 16 bytes to the xsave buffer size. Future patches will use the user-mode XSAVE area to save guest user-mode CET state. However, VMCS includes new fields for guest CET supervisor states. KVM can use these to save and restore guest supervisor state, so host supervisor XSAVE support is not required. Adding this exacerbates the already unwieldy if statement in check_xstate_against_struct() that handles warning about un-implemented xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when it actually check's the xfeature. This ends up exceeding 80 chars, but was better on balance than other options explored. Pass the bool as pointer to make it clear that XCHECK_SZ() can change the variable. While configuring user-mode XSAVE, clarify kernel-mode registers are not managed by XSAVE by defining the xfeature in XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT. This serves more of a documentation as code purpose, and functionally, only enables a few safety checks. Both XSAVE state components are supervisor states, even the state controlling user-mode operation. This is a departure from earlier features like protection keys where the PKRU state is a normal user (non-supervisor) state. Having the user state be supervisor-managed ensures there is no direct, unprivileged access to it, making it harder for an attacker to subvert CET. To facilitate this privileged access, define the two user-mode CET MSRs, and the bits defined in those MSRs relevant to future shadow stack enablement patches. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v5: - Move comments from end of lines in cet_user_state struct (Boris) v3: - Add missing "is" in commit log (Boris) - Change to case statement for struct size checking (Boris) - Adjust commas on xfeature_names (Kees, Boris) v2: - Change name to XFEATURE_CET_KERNEL_UNUSED (peterz) KVM refresh: - Reword commit log using some verbiage posted by Dave Hansen - Remove unlikely to be used supervisor cet xsave struct - Clarify that supervisor cet state is not saved by xsave - Remove unused supervisor MSRs --- arch/x86/include/asm/fpu/types.h | 16 +++++- arch/x86/include/asm/fpu/xstate.h | 6 ++- arch/x86/kernel/fpu/xstate.c | 90 +++++++++++++++---------------- 3 files changed, 61 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index 7f6d858ff47a..eb810074f1e7 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL_UNUSED, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -138,6 +138,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -252,6 +254,16 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + /* user control-flow settings */ + u64 user_cet; + /* user shadow stack pointer */ + u64 user_ssp; +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..d4427b88ee12 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -50,7 +50,8 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -77,7 +78,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 714166cc25f2..13a80521dd51 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -39,26 +39,26 @@ */ static const char *xfeature_names[] = { - "x87 floating point registers" , - "SSE registers" , - "AVX registers" , - "MPX bounds registers" , - "MPX CSR" , - "AVX-512 opmask" , - "AVX-512 Hi256" , - "AVX-512 ZMM_Hi256" , - "Processor Trace (unused)" , + "x87 floating point registers", + "SSE registers", + "AVX registers", + "MPX bounds registers", + "MPX CSR", + "AVX-512 opmask", + "AVX-512 Hi256", + "AVX-512 ZMM_Hi256", + "Processor Trace (unused)", "Protection Keys User registers", "PASID state", - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "unknown xstate feature" , - "AMX Tile config" , - "AMX Tile data" , - "unknown xstate feature" , + "Control-flow User registers", + "Control-flow Kernel registers (unused)", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "unknown xstate feature", + "AMX Tile config", + "AMX Tile data", + "unknown xstate feature", }; static unsigned short xsave_cpuid_features[] __initdata = { @@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_PKU, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -276,6 +277,7 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER | \ XFEATURE_MASK_XTILE) /* @@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void) } \ } while (0) -#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \ - if ((nr == nr_macro) && \ - WARN_ONCE(sz != sizeof(__struct), \ - "%s: struct is %zu bytes, cpu state %d bytes\n", \ - __stringify(nr_macro), sizeof(__struct), sz)) { \ +#define XCHECK_SZ(sz, nr, __struct) ({ \ + if (WARN_ONCE(sz != sizeof(__struct), \ + "[%s]: struct is %zu bytes, cpu state %d bytes\n", \ + xfeature_names[nr], sizeof(__struct), sz)) { \ __xstate_dump_leaves(); \ } \ -} while (0) + true; \ +}) + /** * check_xtile_data_against_struct - Check tile data state size. @@ -527,36 +531,28 @@ static bool __init check_xstate_against_struct(int nr) * Ask the CPU for the size of the state. */ int sz = xfeature_size(nr); + /* * Match each CPU state with the corresponding software * structure. */ - XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct); - XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state); - XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state); - XCHECK_SZ(sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state); - XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state); - XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state); - XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); - XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); - XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); - - /* The tile data size varies between implementations. */ - if (nr == XFEATURE_XTILE_DATA) - check_xtile_data_against_struct(sz); - - /* - * Make *SURE* to add any feature numbers in below if - * there are "holes" in the xsave state component - * numbers. - */ - if ((nr < XFEATURE_YMM) || - (nr >= XFEATURE_MAX) || - (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) { + switch (nr) { + case XFEATURE_YMM: return XCHECK_SZ(sz, nr, struct ymmh_struct); + case XFEATURE_BNDREGS: return XCHECK_SZ(sz, nr, struct mpx_bndreg_state); + case XFEATURE_BNDCSR: return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state); + case XFEATURE_OPMASK: return XCHECK_SZ(sz, nr, struct avx_512_opmask_state); + case XFEATURE_ZMM_Hi256: return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state); + case XFEATURE_Hi16_ZMM: return XCHECK_SZ(sz, nr, struct avx_512_hi16_state); + case XFEATURE_PKRU: return XCHECK_SZ(sz, nr, struct pkru_state); + case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state); + case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg); + case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state); + case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true; + default: XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr); return false; } + return true; } From patchwork Sun Mar 19 00:15:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180134 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6DA3C76196 for ; Sun, 19 Mar 2023 00:16:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46A46280004; Sat, 18 Mar 2023 20:16:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F33F280001; Sat, 18 Mar 2023 20:16:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A96D280004; Sat, 18 Mar 2023 20:16:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id F1793280001 for ; Sat, 18 Mar 2023 20:16:07 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C3375160C80 for ; Sun, 19 Mar 2023 00:16:07 +0000 (UTC) X-FDA: 80583730374.23.E3C9C68 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf16.hostedemail.com (Postfix) with ESMTP id C1053180006 for ; Sun, 19 Mar 2023 00:16:05 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QPuaQR5V; spf=pass (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184966; a=rsa-sha256; cv=none; b=VmkHpSuYjTQ3uHhg9bIG3C6Qklq72+RTWv+GuIL77pm6DXxbWUNVnRGKtyHjIzb13az6o/ zfrIsvFUqyNhwRXZFkQ7n1w1NtFbiwCay/cTCGb0fbj36NbyDrshKStIUHWnUzQiYOQMjh U2AYD9qwXf4DESqiySHhblGY22YPc2w= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QPuaQR5V; spf=pass (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184966; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=+zckbUWRWZvfPROGysXYXzeczc+/ZvxVdCvyMsXY99o=; b=bdVqBnf+RofA6H9lHVvJvmVg/578kLvHg3WZMGMCFWnGQhSqbYEeNg1jeuFU+bqAjZw8nn feWvM/SkSHkNaMDe7AyH3YhW0f53SRNlFLLF61mNV/MSPt9+/IVOmXrvY7ui7lVNwiQRav WpF4V/vGC2DDcH3CZxMkbCZDfXJQ82A= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184965; x=1710720965; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=JJ3ES6pKzLeV3jf7I3FnSvx/TVTiB5ybDZUP+4xGp5E=; b=QPuaQR5Vi4d3Z/x3CyUxK59vRUKuToq9j9Djl3/AZNL5VFrnUhj3+wUS V+e8hk9uYSdvuoi7snnZcabq/H97SMjgC/94Oiyhn4OBrSpsOjSsWFMKD 0sx6AGCXVvti9Z3q2ksWZj5JnmwWuILMgcnTDdsGO6Y/xCr4bHZxJazG9 Si0JexYLGLmk68X+e2DuZPa7ZDHZe7y1nuq4P4aAdoZjLBy1S+uSgzatH cy0tvjF7m8G+QX3T6zROWHoVh+MfCc4WON3GU8QSrgZULpwTQRKx3AAo1 BFJUin+YZv+Z8HBnlSKGexLK6XJkxCKnAtpOWNI39avew12wcnf7Z4WZs A==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490860" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490860" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672788" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672788" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:02 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 06/40] x86/fpu: Add helper for modifying xstate Date: Sat, 18 Mar 2023 17:15:01 -0700 Message-Id: <20230319001535.23210-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspam-User: X-Rspamd-Queue-Id: C1053180006 X-Rspamd-Server: rspam01 X-Stat-Signature: 1rg8duppp8cqacpnqhd8x79kcgsboar7 X-HE-Tag: 1679184965-826996 X-HE-Meta: U2FsdGVkX196o1YugrQdV8G50lSj9vew4gateym0I7/x6rWz6tP16coVt0+uW2SnLyfV43oLc9eS8IQUrrCDL6bw0TUc1LU/mJ8ntgUXbvAbQB5airPgvdm468yZquS6UoAeGxwU3fUgGS1cvFKCYHS12/izUpC4Ovt9PGNiP5FxnteA7ubJ5Vlmtz0LPnAWvMAjXJ5Lxf1HR0OrFtD+UF2WP7bf+AKFdbfg8kVxAXcXymKGp+FFT0FruCqFZIDNEdZ1JwG8xVPAYoHx6vWEwreAowvW4mJJQModcAsEYflhpLn+W88g/BMwkMlsAFDj8gjl+ln83lJU8u1GsSio0ncJExM2ckbLKdlLe2wjRxCOs9vG+TkPt3gjQnYp3JHjt5m/tU6Tf9TcnEAxCS8EgTGkGUHxTCPP/O3E4ayJlIdi/l6yU24bD1dOrWa8LfxoFvQt/GrZzEHT7bqhuXa9X+MkN9TYgFRSUriDv27B1I3PLIAGSy+5YU9NcSz9jmP0RVyPsBj1/T1i4r4RVvvr22nlciRS06nbPy3HC374wYiNMebOORBvudhN1gfgyKuMhtRg/dRqbIju/hjdHEwkAiO6grJQ2erfOv87WJVZBEgp5kakk9jxRQCRMvoGqy41Uu7VigT8C46UJIdl/DJU5AcI82s2nKPVnKux/lZTyx7/uiqWTroHRWE+TuQ0xTh1nm3LmEo8aX8sbyrrwniBkY8sazy/oZtJvAKNeLkojMEy4tQR8cJL9lCKmR+HY/HOHWCf0ShUrmdBCaHIzoUlugdVzwlVg9F1s8XTRliqTOFI+Dr8eJwUkL7TJYQE7Y4LUmmXOd11F3L/VZOsGEUsK6+ybLUjSQgh4ixAASPHxshT/pF6jwwU5+eCSdMuI+2yFElIO3VjW/mukLNYCaHH2XLB937UB780+sse44MPqJxiReYIIgKn8AxuijKmULgXLaGCtcXeSc+5H0Dvacb CEmR7Cao 1yhZy5WU15mfNACcO2GwnEexg9gh4o9psN9UexUoSDjT3Mz6zhaeW8NZShLE4NSf+WVpO9QPTxHzRODL9un3ncdiT9SlY06WEYh2A4lx62KFnziDWKURpen1vPOdlWv/ZVzqeUJAWuV50F7/6AAEsLu6ydMhY+/+Vi/Lwz5URCg2njTPLLr4cEbudJF9fQdc9RfVpQBqLM7tniLNGy9icS+v6iTp7I7lA2FwNx5l+PPc2r2agXKBmDgajhAgFW8X+3MqqXu7Cf4r1MRFRJjB5/wX8HxdFe4+v9LpFji7bE1wZKEN1Tn/4Z5jIdpqPwCgNQIvNSVJqfbHb7n42/rim98Zz2vCC5yMRKsRq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Just like user xfeatures, supervisor xfeatures can be active in the registers or present in the task FPU buffer. If the registers are active, the registers can be modified directly. If the registers are not active, the modification must be performed on the task FPU buffer. When the state is not active, the kernel could perform modifications directly to the buffer. But in order for it to do that, it needs to know where in the buffer the specific state it wants to modify is located. Doing this is not robust against optimizations that compact the FPU buffer, as each access would require computing where in the buffer it is. The easiest way to modify supervisor xfeature data is to force restore the registers and write directly to the MSRs. Often times this is just fine anyway as the registers need to be restored before returning to userspace. Do this for now, leaving buffer writing optimizations for the future. Add a new function fpregs_lock_and_load() that can simultaneously call fpregs_lock() and do this restore. Also perform some extra sanity checks in this function since this will be used in non-fpu focused code. Suggested-by: Thomas Gleixner Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v6: - Drop "but appear to work" (Boris) v5: - Fix spelling error (Boris) - Don't export fpregs_lock_and_load() (Boris) v3: - Rename to fpregs_lock_and_load() to match the unlocking fpregs_unlock(). (Kees) - Elaborate in comment about helper. (Dave) v2: - Drop optimization of writing directly the buffer, and change API accordingly. - fpregs_lock_and_load() suggested by tglx - Some commit log verbiage from dhansen --- arch/x86/include/asm/fpu/api.h | 9 +++++++++ arch/x86/kernel/fpu/core.c | 18 ++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 503a577814b2..aadc6893dcaa 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -82,6 +82,15 @@ static inline void fpregs_unlock(void) preempt_enable(); } +/* + * FPU state gets lazily restored before returning to userspace. So when in the + * kernel, the valid FPU state may be kept in the buffer. This function will force + * restore all the fpu state to the registers early if needed, and lock them from + * being automatically saved/restored. Then FPU state can be modified safely in the + * registers, before unlocking with fpregs_unlock(). + */ +void fpregs_lock_and_load(void); + #ifdef CONFIG_X86_DEBUG_FPU extern void fpregs_assert_state_consistent(void); #else diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index caf33486dc5e..f851558b673f 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -753,6 +753,24 @@ void switch_fpu_return(void) } EXPORT_SYMBOL_GPL(switch_fpu_return); +void fpregs_lock_and_load(void) +{ + /* + * fpregs_lock() only disables preemption (mostly). So modifying state + * in an interrupt could screw up some in progress fpregs operation. + * Warn about it. + */ + WARN_ON_ONCE(!irq_fpu_usable()); + WARN_ON_ONCE(current->flags & PF_KTHREAD); + + fpregs_lock(); + + fpregs_assert_state_consistent(); + + if (test_thread_flag(TIF_NEED_FPU_LOAD)) + fpregs_restore_userregs(); +} + #ifdef CONFIG_X86_DEBUG_FPU /* * If current FPU state according to its tracking (loaded FPU context on this From patchwork Sun Mar 19 00:15:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32DC8C74A5B for ; Sun, 19 Mar 2023 00:16:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 99C55280005; Sat, 18 Mar 2023 20:16:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 923A4280001; Sat, 18 Mar 2023 20:16:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7506E280005; Sat, 18 Mar 2023 20:16:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 57435280001 for ; Sat, 18 Mar 2023 20:16:10 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 29F5D80C99 for ; Sun, 19 Mar 2023 00:16:10 +0000 (UTC) X-FDA: 80583730500.02.F1AED0F Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf05.hostedemail.com (Postfix) with ESMTP id 0C9AD100019 for ; Sun, 19 Mar 2023 00:16:07 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=FVn2TzAM; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184968; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=RjDWz6QztU2WGgm8Ai0lU/L1fcgwZ8GHfXLrNr2QNDk=; b=YH3Z6DSQIMr9BA4UOOMBSwdyIilp7yRyVAPLl5lrLaZ0F9GGLUh+kR+Gju5E2ilfCOUjVP yl6yDrxhKvYYG2okovTMoWIg/9fS/7OOCnfKViICHAR0137TE/xuIWFHZaZC/f6jzppXsk ji5bSRA0CU8Ad9h5DDzLPQsTjwotEqc= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=FVn2TzAM; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184968; a=rsa-sha256; cv=none; b=iXoIXQGMvYO5Tp/hmSljcN4+HYuvCLuAQYQgUCz4JdB1chYmOVtdm5anHeauVhDLzX4ib2 zCx7Rm6Zl5HHgThg6J62TLhu5bFV3heY0OOlrPWcdl4oeUjtEnTHUFRdzx5XCfm4iN6zf7 gKXNC6qH0xCJiNau4iIKNzuqA2AMgGY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184968; x=1710720968; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=/SBgbCkSzTb9QOJ8HmCDkj2Bq5biu5bt/+mw6FahRpc=; b=FVn2TzAMZMDFpoSYIE28PJKacrePH4ZZXMhEQAJC5gdJyYsF6VeOzK0R x37twxpCPRvPC/mrb59vNpSLGxx4xtG1BCEBDmKPHSjyqODRhMLmAeoxA z86htCi4sZPrmi9nk+AlgozSph7Q5NsQ+2DjvrV9gKmZZTZZRUh9rjjVH f/hEBYTU0YzRlcS1Va4Ez5DpJJFzOK9AxlfPjUc7nM6V0jX9t6YIbiFdE rOFFtJIn4hITgp5876b3hNJ4907ba7t2JRG7Qr63jahTCaskw0+2RtVyD k8Y5SYPGwP4bOpfVtLZumMUozHa2Rm8m1dMime7xOQ1pN54VDmSyZv/Xh Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490883" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490883" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672796" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672796" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:05 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 07/40] x86/traps: Move control protection handler to separate file Date: Sat, 18 Mar 2023 17:15:02 -0700 Message-Id: <20230319001535.23210-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 0C9AD100019 X-Stat-Signature: xto57str17cbtkaya9kin9r5prqfg75j X-Rspam-User: X-HE-Tag: 1679184967-730676 X-HE-Meta: U2FsdGVkX19Id8NBQnkL0w3keIutsSXqGWFCbLuCa3GY8Te0VA38JTiPmmj0rWMRt5aM2DR+90ho1fdqnMUng2iYHHhX2qo1HSRsGjEcTxwbZn8rDA+xSo+0CzSksHD1xHvXFEq+l5tE9SeQcXLYlAPFMsXER3vYbBqE4wp4sI+EIr4wtzbMqcUuLn5SJJDzrpzsjHE+VB1ZTL/fH6xDuIL0Lw+6YiImVhC8XAVKdea/ygLXG9lGEIi+er3t+klkBECAi+JrLBWRtDgAVdf+f+9klvsnw3A00Lmr8elSrSJChQpPf6USoselXc+KyH+PHb6/Hf0U28FRm11/A++QymVr1lHYxl/ATn9hHZFj0YEJ5jyxbC5QiXEe54FCvfHK4PeLemyK+5tjcfMs4IgprPKTUJvt8pmDlOpZF2BWxZdHal+LGIbuekwq+w7v4lJyDFXmXo9DXu8kV+j3pA6d9gLCJ8PYsnK82IqQ6DobhNMRI8GEJ1OkV39YC+VVgPvLSLsY2wJstfI0QE0zThKRksImqrCggUD27ZyEVyaMrNF13xf9jxzKaFY3ffl+imomPTlULXXgHZsLCsu6WcV0mh2XlUNkp09wTmKMY7dseTM5v7r/7UPY4/k3WdDOax2fwW1oQ+fIP39S+sJ1rXQ8MHXuKCxLokGgPWAM5dnPQ+kk4thoBtiraBFwozNrkS3u6+GAe5uuyAG1hy0u1370ZMV/j1ymMLwvE2wV6bWWeCz1mS0jxo1JcXh+sVVCm4sjg/VKiWCTj+PXfvprWOeIdCzXoprJZQ7SOLtlcEYDZhhILRz3UrUFvQAsjjgapKbi8sUKPpEBOH7Yj+3f0ztpaPd9VA+0GYPbQLbzzqgb3wO5P98JVQH7r/BvtU3Ht5U1OJ9ArUl3m8q3iXBJHRg7J9p0lt84Rt0HOKPTbESpabJzA0cs2YscGBSdhWEdfm93VNP5KpCXR4m+8qzM2Pp q12MhEzN Cv/DbKYbX1X98fmWO1g91Jj8FfY5kxNqfV+adhzeQPvtLNDcQJT33PqU/6hMy/laEzsvS+qzNO8fbo+Ragvu4YKANuiyFqts53VBJzTeQ4Df8mzXZoq+Pn8A5xdNNbv3RLSvIPMHDf9qzAUzsbfpTuyLwII86BlLTem3yPnHXdJ5K+gH9BMsIK9DIuHko+7AE4gcExGqGpJlFBSb1GfI6MuNpPXKp701VK6EZzVQc0vgxOPOE0sAg+stZEn22sC8CpOCUTvYPQ5pJApUpbFYtCxd/eWfVEAaBOpqyP4Zdc3wGcVNt1840ZC/dufqqgIokk2bWUyhnDKOJczuZuLfQNNlqndsgOdDHLO3F X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Today the control protection handler is defined in traps.c and used only for the kernel IBT feature. To reduce ifdeffery, move it to it's own file. In future patches, functionality will be added to make this handler also handle user shadow stack faults. So name the file cet.c. No functional change. Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Add "x86/traps" to log (Boris) v6: - Split move to cet.c and shadow stack enhancements to fault handler to separate files. (Kees) --- arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/cet.c | 76 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/traps.c | 75 --------------------------------------- 3 files changed, 78 insertions(+), 75 deletions(-) create mode 100644 arch/x86/kernel/cet.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index dd61752f4c96..92446f1dedd7 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -144,6 +144,8 @@ obj-$(CONFIG_CFI_CLANG) += cfi.o obj-$(CONFIG_CALL_THUNKS) += callthunks.o +obj-$(CONFIG_X86_CET) += cet.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c new file mode 100644 index 000000000000..7ad22b705b64 --- /dev/null +++ b/arch/x86/kernel/cet.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include + +static __ro_after_init bool ibt_fatal = true; + +extern void ibt_selftest_ip(void); /* code label defined in asm below */ + +enum cp_error_code { + CP_EC = (1 << 15) - 1, + + CP_RET = 1, + CP_IRET = 2, + CP_ENDBR = 3, + CP_RSTRORSSP = 4, + CP_SETSSBSY = 5, + + CP_ENCL = 1 << 15, +}; + +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + if (!cpu_feature_enabled(X86_FEATURE_IBT)) { + pr_err("Unexpected #CP\n"); + BUG(); + } + + if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) + return; + + if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { + regs->ax = 0; + return; + } + + pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs)); + if (!ibt_fatal) { + printk(KERN_DEFAULT CUT_HERE); + __warn(__FILE__, __LINE__, (void *)regs->ip, TAINT_WARN, regs, NULL); + return; + } + BUG(); +} + +/* Must be noinline to ensure uniqueness of ibt_selftest_ip. */ +noinline bool ibt_selftest(void) +{ + unsigned long ret; + + asm (" lea ibt_selftest_ip(%%rip), %%rax\n\t" + ANNOTATE_RETPOLINE_SAFE + " jmp *%%rax\n\t" + "ibt_selftest_ip:\n\t" + UNWIND_HINT_FUNC + ANNOTATE_NOENDBR + " nop\n\t" + + : "=a" (ret) : : "memory"); + + return !ret; +} + +static int __init ibt_setup(char *str) +{ + if (!strcmp(str, "off")) + setup_clear_cpu_cap(X86_FEATURE_IBT); + + if (!strcmp(str, "warn")) + ibt_fatal = false; + + return 1; +} + +__setup("ibt=", ibt_setup); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index d317dc3d06a3..cc223e60aba2 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -213,81 +213,6 @@ DEFINE_IDTENTRY(exc_overflow) do_error_trap(regs, 0, "overflow", X86_TRAP_OF, SIGSEGV, 0, NULL); } -#ifdef CONFIG_X86_KERNEL_IBT - -static __ro_after_init bool ibt_fatal = true; - -extern void ibt_selftest_ip(void); /* code label defined in asm below */ - -enum cp_error_code { - CP_EC = (1 << 15) - 1, - - CP_RET = 1, - CP_IRET = 2, - CP_ENDBR = 3, - CP_RSTRORSSP = 4, - CP_SETSSBSY = 5, - - CP_ENCL = 1 << 15, -}; - -DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) -{ - if (!cpu_feature_enabled(X86_FEATURE_IBT)) { - pr_err("Unexpected #CP\n"); - BUG(); - } - - if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) - return; - - if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { - regs->ax = 0; - return; - } - - pr_err("Missing ENDBR: %pS\n", (void *)instruction_pointer(regs)); - if (!ibt_fatal) { - printk(KERN_DEFAULT CUT_HERE); - __warn(__FILE__, __LINE__, (void *)regs->ip, TAINT_WARN, regs, NULL); - return; - } - BUG(); -} - -/* Must be noinline to ensure uniqueness of ibt_selftest_ip. */ -noinline bool ibt_selftest(void) -{ - unsigned long ret; - - asm (" lea ibt_selftest_ip(%%rip), %%rax\n\t" - ANNOTATE_RETPOLINE_SAFE - " jmp *%%rax\n\t" - "ibt_selftest_ip:\n\t" - UNWIND_HINT_FUNC - ANNOTATE_NOENDBR - " nop\n\t" - - : "=a" (ret) : : "memory"); - - return !ret; -} - -static int __init ibt_setup(char *str) -{ - if (!strcmp(str, "off")) - setup_clear_cpu_cap(X86_FEATURE_IBT); - - if (!strcmp(str, "warn")) - ibt_fatal = false; - - return 1; -} - -__setup("ibt=", ibt_setup); - -#endif /* CONFIG_X86_KERNEL_IBT */ - #ifdef CONFIG_X86_F00F_BUG void handle_invalid_op(struct pt_regs *regs) #else From patchwork Sun Mar 19 00:15:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180136 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABA22C77B60 for ; Sun, 19 Mar 2023 00:16:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE0B8280006; Sat, 18 Mar 2023 20:16:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D6D21280001; Sat, 18 Mar 2023 20:16:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB63B280006; Sat, 18 Mar 2023 20:16:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 88FE7280001 for ; Sat, 18 Mar 2023 20:16:11 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5BDD11C63E3 for ; Sun, 19 Mar 2023 00:16:11 +0000 (UTC) X-FDA: 80583730542.17.BED2F73 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf16.hostedemail.com (Postfix) with ESMTP id 48F2B18000C for ; Sun, 19 Mar 2023 00:16:09 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="leWeNrN/"; spf=pass (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184969; a=rsa-sha256; cv=none; b=MP3POT01o0JSJsaQyQj+rjbsi3syXUzpSyjllbwnBZpz4ofYJs3iEpqdy7zMvs6zFQ2z94 +bmLXcxNEeh0h+GbSi4OjGmmqM+8NoRptjHWWpF2AoQazeGS16ecp9GPDlgwU8P8//qfC6 7mqWEQwII+IvZERS9+fKjvyAh3e500c= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="leWeNrN/"; spf=pass (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184969; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=8QtzyDW1ykTiAkFRIMO4HMWo77/z46ElfMQlWOnkzNc=; b=xPTzSUxMbnVlxhtBldtyRMi4tP63IpbyuJ9Xjdexrtp55AVVMqF2Q9nWdjlF+fj0w3Mahd HEW2bL1T/18NV1ui+ZRmrimq1hcZciD6rnc0TJuVZ132UO1A8k/yFFzVaU99yDP8uhdImn vJIH1LWVMNFo45ruxxpNjLmTHIHPCwA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184969; x=1710720969; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=4pjPISZWb7kvLOoOGyus9YN15ONkQWn4j+qGkUUpqp8=; b=leWeNrN/M9hD5kgwMQJOVym13vgNgS00cOh+NRkXt9WMs50y8fcjt1zQ x8dLkPabqLu1QllfmwTwQDbSTbXxlHRyhpzLD1RzwndqAo9q9HJ2yfGSm lEppGL0tHxIDP1MPn9aSJgEsjTBLDnefBbO8N8JYSKt9mI5sc1khD9din BU/0pONVrbN08dTXyrlVnOp2UomilrUV8k9mtXmDDWuGrQI53Rg2F/TkK QsAiMCUdM1Z8z40zlX9Sg/a9FXQb7I2DKrMyfjy1feZHkKtuGMu1JBQyi ky7L8s/Qit577u6PsBnssBdrZWTlXcMSRSuoJVW9t1UI5Oqx907P6hgvR A==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490905" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490905" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672805" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672805" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:06 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 08/40] x86/shstk: Add user control-protection fault handler Date: Sat, 18 Mar 2023 17:15:03 -0700 Message-Id: <20230319001535.23210-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspam-User: X-Rspamd-Queue-Id: 48F2B18000C X-Rspamd-Server: rspam01 X-Stat-Signature: ceyojgu6im1rbzjhbj1sc33ehdk4ho5j X-HE-Tag: 1679184969-653434 X-HE-Meta: U2FsdGVkX19zwoqCpbQe/dmeIHdXjcdN+8pAAFDfQWs/U1q9lKNxFmKLRkElclEFn+fc8id4a3iyit61rikhUX4chWA8V6gPxaXyLaHBmwqPy2apygPVelaoQ7t166rEv9zP+uyljuEwetcO8A+DyfEJlyJaIKaBDapB1i1UvUUr+0nVtto6DfVppGKUfhEPy/SL8MAYetD6PmCLV1gS49BJ+VGggR+aeixktbfy/fdyZTjrGrI6wC7+HgGfC9vdjZElE23y6aY1vog48ZTMJAk4GVmmTodlH5ICRpJX9VhKF33lbnDUA4TGZpChsDhCoqPxLb7edLLfj9UV89WLAa4+wHM8JfU8Yv5dBhbp3ECVSo9YIsMS4B4zNFudu0Xamj/oTMHSB+RSNUsH8jE5UYhcrGFZW6Ker4XZW1rkoLfPneZI0MUWmm2VGla9Qm1vtlRBOL7AegtHR8X5f4pYjjnD9NFrAJGsegl/ZlAakmTlJzor+nRFpCn2+gjWkFcS3Gz9OEzuXnw65NrjPZNoNP39Wjo7cDJYEr3y6kkurfIaT0JaF4GYf+/B5tERAioAFa1eAKZUVmGl4ipS6o8wDx3GTki6r+nLewIVwyNi/8xjomzq97TJUnrVXGoIJDFbQpV6nRrHi0KQnLhuyRuOj3zPXEb6eAsSKt/RbEviQzp0UkKwkxuGJJvEdRfONuaGCPnkBhE8bBunwsKTx456V3wi6TcGYq8UOaTprf3TdJ1aX2NAUuf7O1ecojWLodj2efY0Ny9o2EAGFq5b8RHBbwFwBQ81atXljwk22YY4kkm/6Mi4YuAWd/722XBlFpYxuJtFWrQi0ZmN1poWG8Zhvfs6dgxemP4Itfo3/u7zbwyTInuZjM82MCc2Kq00Q+ioiduLju25xPRPqlNUiHQ3lSuZx8JPva+kNnQrGNY7zy7GsKsdu1BEi+4GSF3xlrKh+Fjg0tRzlP9t1hVVJi/ iJruwimJ if1raLGTZsqdLGQ5outdMb8iErgBauwL2KN5Gv7rGZyIpmxnFE3H3kKnDYjWPYOySwznPUGMgeaipJDTkCFr466zEvzBLJjp8OaCDAnAce02YhuvfB/92vVMF+l8SMzGBQYBUkUvkGZE3KZ02zBHSujE04rFrJIErmtqPtqILwp2kRWva2cZw49K1rRM4z2d5/9peHwqESHK6/fmcFainnFxsgpX13nVou5oOmxJgbO50tEcjZwZHY4PzUFsy0J/EcI0Ovf2Skuhk7Ix57vANowrIgmbOO27FWY1fBmxFfKK5Y1ahyEM/jNsul1NDedZhCn2VbaBuQwzrfJLaO08mmRChmyulCz/cRZLS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the shadow stack. There already exists a control-protection fault handler for handling kernel IBT faults. Refactor this fault handler into separate user and kernel handlers, like the page fault handler. Add a control-protection handler for usermode. To avoid ifdeffery, put them both in a new file cet.c, which is compiled in the case of either of the two CET features supported in the kernel: kernel IBT or user mode shadow stack. Move some static inline functions from traps.c into a header so they can be used in cet.c. Opportunistically fix a comment in the kernel IBT part of the fault handler that is on the end of the line instead of preceding it. Keep the same behavior for the kernel side of the fault handler, except for converting a BUG to a WARN in the case of a #CP happening when the feature is missing. This unifies the behavior with the new shadow stack code, and also prevents the kernel from crashing under this situation which is potentially recoverable. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v7: - Adjust alignment of WARN statement v6: - Split into separate patches (Kees) - Change to "x86/shstk" in commit log (Boris) v5: - Move to separate file to avoid ifdeffery (Boris) - Improvements to commit log (Boris) - Rename control_protection_err (Boris) - Move comment from end of line in IBT fault handler (Boris) v3: - Shorten user/kernel #CP handler function names (peterz) - Restore CP_ENDBR check to kernel handler (peterz) - Utilize CONFIG_X86_CET (Kees) - Unify "unexpected" warnings (Andrew Cooper) - Use 2d array for error code chars (Andrew Cooper) - Add comment about why to read SSP MSR before enabling interrupts v2: - Integrate with kernel IBT fault handler - Update printed messages. (Dave) - Remove array_index_nospec() usage. (Dave) - Remove IBT messages. (Dave) - Add enclave error code bit processing it case it can get triggered somehow. - Add extra "unknown" in control_protection_err. --- arch/arm/kernel/signal.c | 2 +- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/signal32.c | 2 +- arch/sparc/kernel/signal32.c | 2 +- arch/sparc/kernel/signal_64.c | 2 +- arch/x86/include/asm/disabled-features.h | 8 +- arch/x86/include/asm/idtentry.h | 2 +- arch/x86/include/asm/traps.h | 12 +++ arch/x86/kernel/cet.c | 94 +++++++++++++++++++++--- arch/x86/kernel/idt.c | 2 +- arch/x86/kernel/signal_32.c | 2 +- arch/x86/kernel/signal_64.c | 2 +- arch/x86/kernel/traps.c | 12 --- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm.S | 2 +- include/uapi/asm-generic/siginfo.h | 3 +- 16 files changed, 117 insertions(+), 34 deletions(-) diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index e07f359254c3..9a3c9de5ac5e 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -681,7 +681,7 @@ asmlinkage void do_rseq_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 06a02707f488..19b6b292892c 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1341,7 +1341,7 @@ void __init minsigstksz_setup(void) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index 4700f8522d27..bbd542704730 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -460,7 +460,7 @@ void compat_setup_restart_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c index dad38960d1a8..82da8a2d769d 100644 --- a/arch/sparc/kernel/signal32.c +++ b/arch/sparc/kernel/signal32.c @@ -751,7 +751,7 @@ asmlinkage int do_sys32_sigstack(u32 u_ssptr, u32 u_ossptr, unsigned long sp) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c index 570e43e6fda5..b4e410976e0d 100644 --- a/arch/sparc/kernel/signal_64.c +++ b/arch/sparc/kernel/signal_64.c @@ -562,7 +562,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0, unsigned long */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 505f78ddca82..652e366b68a0 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -105,6 +105,12 @@ #define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31)) #endif +#ifdef CONFIG_X86_KERNEL_IBT +#define DISABLE_IBT 0 +#else +#define DISABLE_IBT (1 << (X86_FEATURE_IBT & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -128,7 +134,7 @@ #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ DISABLE_ENQCMD) #define DISABLED_MASK17 0 -#define DISABLED_MASK18 0 +#define DISABLED_MASK18 (DISABLE_IBT) #define DISABLED_MASK19 0 #define DISABLED_MASK20 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21) diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index b241af4ce9b4..61e0e6301f09 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -614,7 +614,7 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF, xenpv_exc_double_fault); #endif /* #CP */ -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); #endif diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 47ecfff2c83d..75e0dabf0c45 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -47,4 +47,16 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs, struct stack_info *info); #endif +static inline void cond_local_irq_enable(struct pt_regs *regs) +{ + if (regs->flags & X86_EFLAGS_IF) + local_irq_enable(); +} + +static inline void cond_local_irq_disable(struct pt_regs *regs) +{ + if (regs->flags & X86_EFLAGS_IF) + local_irq_disable(); +} + #endif /* _ASM_X86_TRAPS_H */ diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index 7ad22b705b64..cc10d8be9d74 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -4,10 +4,6 @@ #include #include -static __ro_after_init bool ibt_fatal = true; - -extern void ibt_selftest_ip(void); /* code label defined in asm below */ - enum cp_error_code { CP_EC = (1 << 15) - 1, @@ -20,15 +16,80 @@ enum cp_error_code { CP_ENCL = 1 << 15, }; -DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +static const char cp_err[][10] = { + [0] = "unknown", + [1] = "near ret", + [2] = "far/iret", + [3] = "endbranch", + [4] = "rstorssp", + [5] = "setssbsy", +}; + +static const char *cp_err_string(unsigned long error_code) +{ + unsigned int cpec = error_code & CP_EC; + + if (cpec >= ARRAY_SIZE(cp_err)) + cpec = 0; + return cp_err[cpec]; +} + +static void do_unexpected_cp(struct pt_regs *regs, unsigned long error_code) +{ + WARN_ONCE(1, "Unexpected %s #CP, error_code: %s\n", + user_mode(regs) ? "user mode" : "kernel mode", + cp_err_string(error_code)); +} + +static DEFINE_RATELIMIT_STATE(cpf_rate, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + +static void do_user_cp_fault(struct pt_regs *regs, unsigned long error_code) { - if (!cpu_feature_enabled(X86_FEATURE_IBT)) { - pr_err("Unexpected #CP\n"); - BUG(); + struct task_struct *tsk; + unsigned long ssp; + + /* + * An exception was just taken from userspace. Since interrupts are disabled + * here, no scheduling should have messed with the registers yet and they + * will be whatever is live in userspace. So read the SSP before enabling + * interrupts so locking the fpregs to do it later is not required. + */ + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + cond_local_irq_enable(regs); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + /* Ratelimit to prevent log spamming. */ + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + __ratelimit(&cpf_rate)) { + pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)%s", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + cp_err_string(error_code), + error_code & CP_ENCL ? " in enclave" : ""); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); } - if (WARN_ON_ONCE(user_mode(regs) || (error_code & CP_EC) != CP_ENDBR)) + force_sig_fault(SIGSEGV, SEGV_CPERR, (void __user *)0); + cond_local_irq_disable(regs); +} + +static __ro_after_init bool ibt_fatal = true; + +/* code label defined in asm below */ +extern void ibt_selftest_ip(void); + +static void do_kernel_cp_fault(struct pt_regs *regs, unsigned long error_code) +{ + if ((error_code & CP_EC) != CP_ENDBR) { + do_unexpected_cp(regs, error_code); return; + } if (unlikely(regs->ip == (unsigned long)&ibt_selftest_ip)) { regs->ax = 0; @@ -74,3 +135,18 @@ static int __init ibt_setup(char *str) } __setup("ibt=", ibt_setup); + +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + if (user_mode(regs)) { + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + do_user_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } else { + if (cpu_feature_enabled(X86_FEATURE_IBT)) + do_kernel_cp_fault(regs, error_code); + else + do_unexpected_cp(regs, error_code); + } +} diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index a58c6bc1cd68..5074b8420359 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -107,7 +107,7 @@ static const __initconst struct idt_data def_idts[] = { ISTG(X86_TRAP_MC, asm_exc_machine_check, IST_INDEX_MCE), #endif -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET INTG(X86_TRAP_CP, asm_exc_control_protection), #endif diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c index 9027fc088f97..c12624bc82a3 100644 --- a/arch/x86/kernel/signal_32.c +++ b/arch/x86/kernel/signal_32.c @@ -402,7 +402,7 @@ int ia32_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c index 13a1e6083837..0e808c72bf7e 100644 --- a/arch/x86/kernel/signal_64.c +++ b/arch/x86/kernel/signal_64.c @@ -403,7 +403,7 @@ void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index cc223e60aba2..18fb9d620824 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -77,18 +77,6 @@ DECLARE_BITMAP(system_vectors, NR_VECTORS); -static inline void cond_local_irq_enable(struct pt_regs *regs) -{ - if (regs->flags & X86_EFLAGS_IF) - local_irq_enable(); -} - -static inline void cond_local_irq_disable(struct pt_regs *regs) -{ - if (regs->flags & X86_EFLAGS_IF) - local_irq_disable(); -} - __always_inline int is_valid_bugaddr(unsigned long addr) { if (addr < TASK_SIZE_MAX) diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index bb59cc6ddb2d..9c29cd5393cc 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -640,7 +640,7 @@ static struct trap_array_entry trap_array[] = { TRAP_ENTRY(exc_coprocessor_error, false ), TRAP_ENTRY(exc_alignment_check, false ), TRAP_ENTRY(exc_simd_coprocessor_error, false ), -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET TRAP_ENTRY(exc_control_protection, false ), #endif }; diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S index 4a184f6e4e4d..7cdcb4ce6976 100644 --- a/arch/x86/xen/xen-asm.S +++ b/arch/x86/xen/xen-asm.S @@ -148,7 +148,7 @@ xen_pv_trap asm_exc_page_fault xen_pv_trap asm_exc_spurious_interrupt_bug xen_pv_trap asm_exc_coprocessor_error xen_pv_trap asm_exc_alignment_check -#ifdef CONFIG_X86_KERNEL_IBT +#ifdef CONFIG_X86_CET xen_pv_trap asm_exc_control_protection #endif #ifdef CONFIG_X86_MCE diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index ffbe4cec9f32..0f52d0ac47c5 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -242,7 +242,8 @@ typedef struct siginfo { #define SEGV_ADIPERR 7 /* Precise MCD exception */ #define SEGV_MTEAERR 8 /* Asynchronous ARM MTE error */ #define SEGV_MTESERR 9 /* Synchronous ARM MTE exception */ -#define NSIGSEGV 9 +#define SEGV_CPERR 10 /* Control protection fault */ +#define NSIGSEGV 10 /* * SIGBUS si_codes From patchwork Sun Mar 19 00:15:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E02DC761A6 for ; Sun, 19 Mar 2023 00:16:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C9E8280007; Sat, 18 Mar 2023 20:16:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5546F280001; Sat, 18 Mar 2023 20:16:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A556280007; Sat, 18 Mar 2023 20:16:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0F0FF280001 for ; Sat, 18 Mar 2023 20:16:13 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B5A5C1C6408 for ; Sun, 19 Mar 2023 00:16:12 +0000 (UTC) X-FDA: 80583730584.06.0E2E096 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf05.hostedemail.com (Postfix) with ESMTP id BB084100007 for ; Sun, 19 Mar 2023 00:16:10 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=RmaMLnX8; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184971; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=ocahhM3Bn6Dfg/QyiFRf3gcfnukdUEBCs6hjdna9J3s=; b=10RZWgm16wGxxIvQlhxA6TcoCHioieIF/JnzPjk92BI6qbxtZYa/29qZbQ0DyOAPmpH6BU I6TZzN1atSiAGwh8oTpqjsh5EmT+KJ1PlHiq1zzHNkaQHo2lvdXJqgSD8nM+lgX48qit3+ aY6uEktiBYLD4nX5COS0nH18Mup6AgQ= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=RmaMLnX8; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184971; a=rsa-sha256; cv=none; b=XOjT4hwOhjjKs59t838+BuOoQnO4YWnc5MwOAZjKwvrxBhg1AchKqUSJ7B1V5OwIhSyD/b Docb2yciwBKWYuM++Gg2iKUF1xdb1m2YGquWmWTr3Blf6ktFENBdglkRbpwteDDKaD0cbH gs/qJ96tLx7Fp2/O3DIJQeMNAsc4dfI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184970; x=1710720970; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=dTEoEHoAOcO1vfWEsqQCGpPA+bqJxwa+ObH7rTEuQRQ=; b=RmaMLnX8lNGgn1quXRXsp3sMI1UDnggwyA3esMojeikPHY23tkfuZsmN w5KRinzFMYjyI9na/Q51IHEPTKprvqZsbQnxRfpeAC0ZJj3E3ekUW9GoZ HlPnszu9uu8aqkuNTRk0e9qGUtNux5vB4FDfMptRfCM7kKAuZ0iRXzHDx rI6ZI7ycIR/Jc5D5f809oww434m+BpEGUHuNqoqFJidvCkb1iqfxZoIIA HaXYamIxqZ8aisargCPSpOKAI8Af8hI5KsBN35oPUS9YV8Mh2jwV8qTfT vhLd1+h4wrn0QF3USzaMjBr3/Q3xhjhYKbI3dyZ6EmTYME9EGEhI4XW5N A==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490927" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490927" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672812" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672812" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:08 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 09/40] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Date: Sat, 18 Mar 2023 17:15:04 -0700 Message-Id: <20230319001535.23210-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: BB084100007 X-Stat-Signature: 6m4t8em63xxtrcienp9mn75wn49g3p59 X-Rspam-User: X-HE-Tag: 1679184970-969634 X-HE-Meta: U2FsdGVkX18EDN2LmkTFnJz9BSuwzVx4E6qjqKvol2hpGqafIRsx/kHeJKIGVN87MexYp+I1ChmldmEpAKhygD2YRYt9rkC/mSnTAO6JGD33WlJxmPme1RCVlrw+9uYT7+aVANCQhBEoLEwmxyKsTe6Mn8PnEtCB9A6J47hcRpvFXJt+joPn4smdLGBPzutPPYSeDqx+WE2aO6lV49+LYgK7Jszh2YOi3pxYiFbBdhkFbsXJPyg1xdvkIzgoMXTvFWNgIwaxieugnGaYMmGIUuovspXw90RIfd9sfFEgi7ynxoxtQ8rZh/hLWFgKZp2GMLMhShKO57GlmsErnUW1yfqx7iJGEiWhanBsQZrSjVf9aprEdYQK5SaIQhT7Qsvr5iwWRMBZ1KUx3cIJ/WjkKx4YIeDdXAFYuWD8Vpp9vPwvm9Y1yAX/lOV7tOsBtAcQJgfdp34Fmf9tgmwzx7i65LpBH+e30pdNZ1cX9vtUIDoG7WoG47GArSyFyf4jxdnayjbXaCzku8R/SlaydrhKaCSoMmeu7hbKDOZLJjI4Tqys/DCaYtPg4/eeQ7WD0oOVwqOjIfixj2xMpPwwTnVfIDcYcrcXJASFftty6dXHZnz8j0vy+jfB1xIOtDk++fmmV6Qx7x/ESxvKme++mBqKGJZ+ilA7ZspH1zzQF8pxiODU69myK6/IgvsUisgCD6CZSGINZSpFdD/jI6Fnnf2vw736EH1UxEKNfc+3ONYShxMA7qoKeR0Lb9Dhk7hLncSFJRVT6nYzxo01JKjOciKJBOw8NEIfP8L8YXhS3wTy+54lqvzG4FPc3FQfkz7bwxl6U+H8/BYJEDIdykZVIlfu2TUS0vgeth5dbni8GTsOstaM0Em5wagLk3AN0TyYavU9vqqUlyPP4d6jL5T4StQr7L8R5ZQNfCLOlBp/XASNqcKAGig8NxnoPh2pCaJ74jO5LK4CWvJT3TbdjidQWuw uq1uHeeD lBLwf/ZKjvAWb5gYkSOrQ2D+WZMjQ1hbgjH48DNyAnM1ySbKNxFi/+L3myCJS51r05wmV5s9O9i/Wtf1LerDLb6I1n/CGS8d3Dw7yqbxlPkgvgheUfvxkyMEBwsOxmebZHMXmdB5TrBS9+7NPBcCXj9wQyMQKDDSf+BKmJpoM/iWef47HeGtopDAH0f+wzuimTetw2GUPAHJ93lhEMCjwiJoy2QRJ+g2AI2AINo9M3bWXv9ql3Rj4J/1G70w6SCInDm7a06P9ijzp+dDKykCXuqX1ZlGPNkCN7jktw1arz0KdqO6iltO9UExhbr7TpD4Q4SMmU936vZCjAPB8tLfpcuukDXkyFsee4v5XIHlXtj+uq2ATL1q13EqPWdz8DiWP6DkvdeDkGwqWNDTznZ3PgfjetwIWIB92/D2nYqKtJM3PjXY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: New processors that support Shadow Stack regard Write=0,Dirty=1 PTEs as shadow stack pages. In normal cases, it can be helpful to create Write=1 PTEs as also Dirty=1 if HW dirty tracking is not needed, because if the Dirty bit is not already set the CPU has to set Dirty=1 when the memory gets written to. This creates additional work for the CPU. So traditional wisdom was to simply set the Dirty bit whenever you didn't care about it. However, it was never really very helpful for read-only kernel memory. When CR4.CET=1 and IA32_S_CET.SH_STK_EN=1, some instructions can write to such supervisor memory. The kernel does not set IA32_S_CET.SH_STK_EN, so avoiding kernel Write=0,Dirty=1 memory is not strictly needed for any functional reason. But having Write=0,Dirty=1 kernel memory doesn't have any functional benefit either, so to reduce ambiguity between shadow stack and regular Write=0 pages, remove Dirty=1 from any kernel Write=0 PTEs. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v6: - Also remove dirty from newly added set_memory_rox() v5: - Spelling and grammar in commit log (Boris) v3: - Update commit log (Andrew Cooper, Peterz) v2: - Normalize PTE bit descriptions between patches --- arch/x86/include/asm/pgtable_types.h | 6 +++--- arch/x86/mm/pat/set_memory.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 447d4bee25c4..0646ad00178b 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -192,10 +192,10 @@ enum page_cache_mode { #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 356758b7d4b4..1b5c0dc9f32b 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2073,12 +2073,12 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); } int set_memory_rox(unsigned long addr, int numpages) { - pgprot_t clr = __pgprot(_PAGE_RW); + pgprot_t clr = __pgprot(_PAGE_RW | _PAGE_DIRTY); if (__supported_pte_mask & _PAGE_NX) clr.pgprot |= _PAGE_NX; From patchwork Sun Mar 19 00:15:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180138 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC5E8C7619A for ; Sun, 19 Mar 2023 00:16:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07349280008; Sat, 18 Mar 2023 20:16:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F416D280001; Sat, 18 Mar 2023 20:16:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1B3A280008; Sat, 18 Mar 2023 20:16:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B9EA6280001 for ; Sat, 18 Mar 2023 20:16:14 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6BA13A0D06 for ; Sun, 19 Mar 2023 00:16:14 +0000 (UTC) X-FDA: 80583730668.28.A1BE286 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf16.hostedemail.com (Postfix) with ESMTP id 5CD84180003 for ; Sun, 19 Mar 2023 00:16:12 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gBekljT1; spf=pass (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184972; a=rsa-sha256; cv=none; b=voHiw0KlZ5PVCFzzKfGfCSu7MPf2vqcYInHzF1MOJBcKSPx62UtupXRZCpz3MqafyyWCiv 0o+94kyHaeow7u9ORCIds4fN/Sbg+jl3RuFmf9D9L6WlSh3kXkC6t7BkSjafQsz81w4pAc cToJMsFWt392bNYZ5U8vVg97hJORraE= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gBekljT1; spf=pass (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184972; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=sRLLsxiM3hz9WKTUNGj41sd5fcP3AEYnmcR9FvvjXv8=; b=jknMp4bv9a9FOpApiiox/zyKRJM9Wfo8rW8eB9h7Y/JvdQmW87XDJ0elscUwjXIwGEIsa2 WSEyLfaGoWnS0VJ+Vz6kC+MvLEvOTJUO0c2k1B9A/h80kf9EGuc3uNY363szxeje5dg3kY /mBhgJSbiRhG9Ly6xgrkGIWtJx9GHG8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184972; x=1710720972; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Vja8kjxFIpUKpFfGiffO7CqOO8r6G2hYmB0i6nbkB4w=; b=gBekljT1/CzDPKZxkPqipY7WIYhhWU7X4p7ZcScxOp7JOrnti06om6xy ZzFYBh0Liq9YFB81BO5Fwb7m6iQgoGEOVysKi0igWM5yXQ9jem8fB3j0M lTNelCv+cu6hL08ovxht2HdZw7YJPxkdnO4d8579vzqPTFqgJ9KaRKeP9 ZmGUe+uPcGSqZLipdSlzSpOiFkp0cg7KvEKWixxsqXogE5fRuu+JwLOtU aVW9MJRrtjoA6DixCVMGhsCGCecDxShpjyBHol1172grmTCED8IMDlHr7 sT2mL6d9AFGdhswu1JCm24uu94gO5EY94Yzrr77NqQMrf3r9HfgZ96P+V Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490949" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490949" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672817" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672817" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:10 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 10/40] x86/mm: Move pmd_write(), pud_write() up in the file Date: Sat, 18 Mar 2023 17:15:05 -0700 Message-Id: <20230319001535.23210-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspam-User: X-Rspamd-Queue-Id: 5CD84180003 X-Rspamd-Server: rspam01 X-Stat-Signature: xoip3qqga7rgez1fsagdb3i55einwiaj X-HE-Tag: 1679184972-446234 X-HE-Meta: U2FsdGVkX1+eJgdPLFSBMDNsbtYoWDP++9Ld1LQVXLUQwmrQhQUEt23IILTqGgzJIKGGDRqu87q2WRe6hcTBvfFDvyb4a20yUU8vtj6wC/W6HhVtS7S/xRuwvYhbOE0pgpS+xpVPQdb9selvoNkbqoPVTVtCxxtzz2HbOvqHHfei1bOZtInhQ3JDIM4ffUlf2We0w0OzOCDBXp41jrnw8dRFgxdVH5/8LleCMmRD9BKcnY/oN9oxagepCZWbsy3u+D3UJ5EXYFiE4tv/sfZ7CkfnvV768voGDjFWzYQ/3AjonPWkxcSd1rDJX93j4laJnSeG0AzyNT4k/iaaTERRC8pFqBt3ycYt7xpuxrKhZSQGBZ8LqW3bK5x4qb4qodKDlmvdj6yWvSr2TIwenQ6QI99U9C9SL0a8OrvmXk6O9xrdLW9qMZQ/h8sTdY7Z53jSwl2kNMLJfNzf/gYuJtRrZ/mQd9YUHX77L6RBxUn5hMpxXLHzcyPyabvDT141WnOq/+CK4TffU4svCHObrdjJXjXHOhx+y4KLXQIOCHvXlJfwzklhn2d2l5xa/iLKZpn+nLrGROW92i1m/ckR6Eb1bocsXHANFp8opVZ4P0+N8rqe0kE2f6EMmmk31S+0gv5gGeiowGdK8zjLfW0NZi833+7fKx5qQeQ+iWP4h1oshZzXI+yvDuTaNQrxvENEHNQpxCZwzzs8BSo0q7CwXk2rkMtX6yLvv2LteRH8E9clVvnBiU1GFLVjwHEXcHNAMDuJYmPaDV0v3+CuipkDTS2qJql6JWbgQrLI0avDBI1aSHUOwEbohsVZZNl22vieaapKRab9J2awgg8iLJZIMuUcDrdd8OFOjuL01PJYmz57z+ri8GTNmq8b3kZefX0bZxlAR6QUVJkeZ9NC4GambppnJon66Wot8SnB4f3KGfo5Lw6QZAVkbX9BuTD5Cy3ZxYS4HLiaxsMjvVvoGB8ispd MZjB44hT 9En1hTzWbizo8uiinysELefdNUyN4AU//J/+Hj07ZbnSRpt2+/8x90+qSRw+d8qIbx26hOdlHs9+9SuTVHnTZzCGxVY0mwHfeemWRlGmBkAyLGt23xquUvhMSyXWFKYrHQo0DY6TEv9Hq7vb1in5xvGbTxT+2fJj06YRnFJ/KheD5gs255kfN4p/JRitNNdnUp3FUEpVkF42qHKOLOGekk7wY9srO8HFDQwfuTPr9rPD4dWVX/il/dxnjDnrv4pjBbmjuyOz8VYUjNxxSOfOrE/Tt4MhNjpdEa40ossxPN3os5v2GXawq6/DMJUBVfx4TMKE3i2nqlGmjLd7hJZjJTtDAb+YLs0z2hErESNy0I5hZDjwWtMas9lWGF+Qtlk9dH8vgYnaz90E/jcw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To prepare the introduction of _PAGE_SAVED_DIRTY, move pmd_write() and pud_write() up in the file, so that they can be used by other helpers below. No functional changes. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 7425f32e5293..56eea96502c6 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -160,6 +160,18 @@ static inline int pte_write(pte_t pte) return pte_flags(pte) & _PAGE_RW; } +#define pmd_write pmd_write +static inline int pmd_write(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_RW; +} + +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + static inline int pte_huge(pte_t pte) { return pte_flags(pte) & _PAGE_PSE; @@ -1120,12 +1132,6 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#define pmd_write pmd_write -static inline int pmd_write(pmd_t pmd) -{ - return pmd_flags(pmd) & _PAGE_RW; -} - #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) @@ -1155,12 +1161,6 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } -#define pud_write pud_write -static inline int pud_write(pud_t pud) -{ - return pud_flags(pud) & _PAGE_RW; -} - #ifndef pmdp_establish #define pmdp_establish pmdp_establish static inline pmd_t pmdp_establish(struct vm_area_struct *vma, From patchwork Sun Mar 19 00:15:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AF94C74A5B for ; Sun, 19 Mar 2023 00:16:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F1A13280009; Sat, 18 Mar 2023 20:16:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EA1EB280001; Sat, 18 Mar 2023 20:16:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C804B280009; Sat, 18 Mar 2023 20:16:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B1395280001 for ; Sat, 18 Mar 2023 20:16:16 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 73EDE40C76 for ; Sun, 19 Mar 2023 00:16:16 +0000 (UTC) X-FDA: 80583730752.10.26B97C0 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf05.hostedemail.com (Postfix) with ESMTP id 35C96100005 for ; Sun, 19 Mar 2023 00:16:13 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aBhlqFre; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184974; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vY6sCeD2TLN/+eCA+S8sO9I7Rz5ryINUnVTgEv9BsKg=; b=6Sv+OTJ83qxfUn9drFUlHrfPVXud5tS6c3fnhZT49YApxV8beZXaDpZx3TGTSDz3O7rEux uwVuQzVZfMzjFhqvs31LHHAspTOq42qg82JZuKxcLiwybyPgvjLBZ6JADRg0dKAR7RH8Qk LmIgq1lQEARYSVQhfBTr2l9RRFMocnc= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aBhlqFre; spf=pass (imf05.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184974; a=rsa-sha256; cv=none; b=MiR3VqcejGCGGEnZq8Zcx4kRBW4H3nhueesQHBfBuHzJIMGSYl7UhY/J7+3179rNIAZxcx HeB2fKU8spREpaFNXuiYYfCuJmB5WFzKKzofVyJQ9VKuRSqyxRbgQOw1Oubad772Diyt7c 3VvYfA813sR4xoCgJEsXC+OMFl/IB/A= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184974; x=1710720974; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3uhcjpuPc6iiKv5YasMIVAsUAr3DhRKyR9YoK1X/nYQ=; b=aBhlqFreREzLW78s7yAOPHYjFk4GAKRyM6eHe9CfbzosVgLSV0s4C/vk G22+oBJvpNgHdbW9i5dXz0rZfvZ9olCX2js2Xe15SZNIMmFWgSQtXlECZ DDBjrrKhX8fNcA5xRHle4pXyCe/OfkK1DUPlweXA1jGJLJ0pTZbqN0wqm JBaPEDVKJB2h+eGXsA89PZefZH6dqhF16nHiG0icFQkU82jLk6YviJ7tk 3P2JMeT21xR5jGrikqUFGByK418TKM3pC6KKQ0dYtQj7XeUjvtrYEw+sD 7fy9locGsF7pytV49AY1WD9JeP71bqtLUDKSqdBHDfz7Be0gY801135PE g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490971" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490971" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672821" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672821" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:11 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 11/40] mm: Introduce pte_mkwrite_kernel() Date: Sat, 18 Mar 2023 17:15:06 -0700 Message-Id: <20230319001535.23210-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 35C96100005 X-Stat-Signature: nctycnoi7sthau3csb1jbmy4p4zst5gx X-Rspam-User: X-HE-Tag: 1679184973-614499 X-HE-Meta: U2FsdGVkX1+CPEIeNx4TX46Niu5Wj+wVXvGAfVYzPUowwm/5YRbJIJLSV/1UaOPpzg3qkE5Enf8XiTzlOkKOaJKJnTA29Xo/UWtzF80FHsnHQhW+o9gzQmexdZYRDDLerFvKdrwHr+MWUX7m9G0pcLXYo1ygYXnx/saItOdBCaUzVWsFKASYSh7iVypHnD4hkZDw/8DqHkXRhstlbtg+4/7By4EVu4oPNWiXWFsjoQtXUS+KMO42fI/LpTe/fyWZlLIh6hXIy0VZLfQKts+5+Mmg33jP5x5HZieoKwz/P/IJbZZeN3XyWEhEz7eErHTJ2hv/k6S567CGH/cUbmS04BUXrrc9uZBYkAnDbLwPVDLZCXW9GEFR3hEptCg1hz7YtYCmfAgo24vSqdqOfvFBBV+B+NM1MioQZpj5wjB/oD2A18yTy9rsyHdubN4ux6Kok6LER5MBnJaeXtj1+4T3XdmHE1GrUy+qg4v9DOLeYfK2X/RzjDeK+CbLz/JoxUp+OE2YN1/kUx6s7H9jYv32PO/4LC6/MaNo3qLK4ZOkY96ydrYT6CSnpI51olI14O1iMnaDhQklt8072gLMfHCAcM4F4e2JQe3+TkKeRN+DY72zyYxYOZ2xt5nq6myA42Fht+8AfrtJkHO+RkAYe60gGE1u86V7AGa909jf6IMwqo0Igy/3qVEKcKHHMeha8yQgAkx1oGCzdB8Gdy/fD92UD29QkE28KaJ8qFpMb/LEoM0/r6fR4foKs/nErjSEXJLSYC0Ysf/AUnaZcP4QSk3q8BSu16ypJEkonQDZ/KK59CCGe8rc0sSEJ2AAQ3URfiNM9Or2xtsyFvSEYLxpPw7XpYRmcaR0f20V6cD9dNleLxQNwaStZmHVuOBDRauQxRbYB0XpY5Ku9jwE549TMwj+UKtNP4gAp1A4d8JpwKu4r6dtJiUGT13fPvs5TzcrHTcp1nzw8GRbv2HEIB5weK4 zCqcRHxt TSgdGPHOMeW/go58QhDoCEDvCaPaFnF+O31ING7O9z0Y9KRicxRbHQOZh4fbT5nYd+j9XMVb61fhsd9T4Dl+n1hk0JZPQNw+sYXk5inzxKbuEQlxd3uNIi50QHnjPqdQ458oN9e2DwRSxAWoanjOhuAu6dQjMLrD2UvTjoQ6ERK1fs9H+W7754JquzjHeZyBajjqvA8xFBHvScVD6Mf/SiSvtrNYNnOOTHiV8wmq4c9Zubf41bpLW5UlWGgYst9vTKYOy8Jgv14zSXIwbB9Z5npaNkVFQOgzULAm5DgJ/Nw4HBjLdQdyQe6Qh3DT8SowRCsMp622ocdbABBCevzobAfGHYdMiSTwXC6uG7XKpw/sn6Rpwo86Rqj8RZjYTpgTM8agIbwmccQ2lhiNo4Sue8uMUXGLx5fX1RPCmlIp2KN49zHP+ZWyCEw9JPAHB3OJ0pLZsw1gc6LrmA6IIVk8Fjv8Y1REmTk5Kfzx2/mbBQR8fx1j4g+XS7zaxaiStDHhADuBVSDCYAdk7Dlv1FX6KzP7mUw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these changes is to allow for pte_mkwrite() to create different types of writable memory (the existing conventionally writable type and also the new shadow stack type). Future patches will convert pte_mkwrite() to take a VMA in order to facilitate this, however there are places in the kernel where pte_mkwrite() is called outside of the context of a VMA. These are for kernel memory. So create a new variant called pte_mkwrite_kernel() and switch the kernel users over to it. Have pte_mkwrite() and pte_mkwrite_kernel() be the same for now. Future patches will introduce changes to make pte_mkwrite() take a VMA. Only do this for architectures that need it because they call pte_mkwrite() in arch code without an associated VMA. Since it will only currently be used in arch code, so do not include it in arch_pgtable_helpers.rst. Suggested-by: David Hildenbrand Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Acked-by: Deepak Gupta Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ --- Hi Non-x86 Arch’s, x86 has a feature that allows for the creation of a special type of writable memory (shadow stack) that is only writable in limited specific ways. Previously, changes were proposed to core MM code to teach it to decide when to create normally writable memory or the special shadow stack writable memory, but David Hildenbrand suggested[0] to change pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be moved into x86 code. Since pXX_mkwrite() is defined in every arch, it requires some tree-wide changes. So that is why you are seeing some patches out of a big x86 series pop up in your arch mailing list. There is no functional change. After this refactor, the shadow stack series goes on to use the arch helpers to push shadow stack memory details inside arch/x86. Testing was just 0-day build testing. Hopefully that is enough context. Thanks! [0] https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ v6: - New patch --- arch/arm64/include/asm/pgtable.h | 7 ++++++- arch/arm64/mm/trans_pgd.c | 4 ++-- arch/s390/include/asm/pgtable.h | 7 ++++++- arch/s390/mm/pageattr.c | 2 +- arch/x86/include/asm/pgtable.h | 7 ++++++- arch/x86/xen/mmu_pv.c | 2 +- 6 files changed, 22 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index b6ba466e2e8a..cccf8885792e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -180,13 +180,18 @@ static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot) return pmd; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_kernel(pte_t pte) { pte = set_pte_bit(pte, __pgprot(PTE_WRITE)); pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY)); return pte; } +static inline pte_t pte_mkwrite(pte_t pte) +{ + return pte_mkwrite_kernel(pte); +} + static inline pte_t pte_mkclean(pte_t pte) { pte = clear_pte_bit(pte, __pgprot(PTE_DIRTY)); diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index 4ea2eefbc053..5c07e68d80ea 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -40,7 +40,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) * read only (code, rodata). Clear the RDONLY bit from * the temporary mappings we use during restore. */ - set_pte(dst_ptep, pte_mkwrite(pte)); + set_pte(dst_ptep, pte_mkwrite_kernel(pte)); } else if (debug_pagealloc_enabled() && !pte_none(pte)) { /* * debug_pagealloc will removed the PTE_VALID bit if @@ -53,7 +53,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) */ BUG_ON(!pfn_valid(pte_pfn(pte))); - set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte))); + set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_kernel(pte))); } } diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 2c70b4d1263d..d4943f2d3f00 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1005,7 +1005,7 @@ static inline pte_t pte_wrprotect(pte_t pte) return set_pte_bit(pte, __pgprot(_PAGE_PROTECT)); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_kernel(pte_t pte) { pte = set_pte_bit(pte, __pgprot(_PAGE_WRITE)); if (pte_val(pte) & _PAGE_DIRTY) @@ -1013,6 +1013,11 @@ static inline pte_t pte_mkwrite(pte_t pte) return pte; } +static inline pte_t pte_mkwrite(pte_t pte) +{ + return pte_mkwrite_kernel(pte); +} + static inline pte_t pte_mkclean(pte_t pte) { pte = clear_pte_bit(pte, __pgprot(_PAGE_DIRTY)); diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index 85195c18b2e8..4ee5fe5caa23 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -96,7 +96,7 @@ static int walk_pte_level(pmd_t *pmdp, unsigned long addr, unsigned long end, if (flags & SET_MEMORY_RO) new = pte_wrprotect(new); else if (flags & SET_MEMORY_RW) - new = pte_mkwrite(pte_mkdirty(new)); + new = pte_mkwrite_kernel(pte_mkdirty(new)); if (flags & SET_MEMORY_NX) new = set_pte_bit(new, __pgprot(_PAGE_NOEXEC)); else if (flags & SET_MEMORY_X) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 56eea96502c6..3607f2572f9e 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -364,11 +364,16 @@ static inline pte_t pte_mkyoung(pte_t pte) return pte_set_flags(pte, _PAGE_ACCESSED); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite_kernel(pte_t pte) { return pte_set_flags(pte, _PAGE_RW); } +static inline pte_t pte_mkwrite(pte_t pte) +{ + return pte_mkwrite_kernel(pte); +} + static inline pte_t pte_mkhuge(pte_t pte) { return pte_set_flags(pte, _PAGE_PSE); diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index ee29fb558f2e..a23f04243c19 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -150,7 +150,7 @@ void make_lowmem_page_readwrite(void *vaddr) if (pte == NULL) return; /* vaddr missing */ - ptev = pte_mkwrite(*pte); + ptev = pte_mkwrite_kernel(*pte); if (HYPERVISOR_update_va_mapping(address, ptev, 0)) BUG(); From patchwork Sun Mar 19 00:15:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99424C7619A for ; Sun, 19 Mar 2023 00:16:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 554BB28000A; Sat, 18 Mar 2023 20:16:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B91E280001; Sat, 18 Mar 2023 20:16:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E2E528000A; Sat, 18 Mar 2023 20:16:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0AF41280001 for ; Sat, 18 Mar 2023 20:16:18 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E1171C0CF8 for ; Sun, 19 Mar 2023 00:16:17 +0000 (UTC) X-FDA: 80583730794.02.AA848F0 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf16.hostedemail.com (Postfix) with ESMTP id B24C2180014 for ; Sun, 19 Mar 2023 00:16:15 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=j9EcXTtc; spf=pass (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184976; a=rsa-sha256; cv=none; b=vi/32Wgx2444G9lbWB0R49QE8tieFUynMlAlD9pGYKeBWJRx07bkG/DFeuv/MtMA5x94BK WyvRvLFojOjN+A2J4uXh00EhN9LKrF1auUp6ObDvqgBseVan0IeRFYtKIPyOOPl3f/1AM7 mlZH6x9T4lxM08966AwrjkWaSoKU4uA= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=j9EcXTtc; spf=pass (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184976; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=33oWnhrr3jWrLjHxAwTEeqeEC+bn441diBSfuvAd5ZQ=; b=i+9BHIVqdPkjSYlNFHs7TSYqXEAycResaqkFFn+aDMbSOJ88I/Ct9oHxx/9oJ+OOiSfIIE 7GybakQ4Qab4+5itRxsbNT0eHwO64vqj8mgzMyE6SBE4ThI34HDShmuUwVp072aUvLeY5z pyDsTCMqCQe6lyWme2NqFD/wEzseNZI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184975; x=1710720975; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=izjgG3aKsLsDvcFVl85obe6QvAjx4LnRxYiOny/YjPI=; b=j9EcXTtcXRuB47kINGnCdrkCJpOjtDD8iQ91XYLlqR4vODNTqSM4z4eE sgBkozvVQhTEvAxAXpyaecv10MFcPLsQbHBv8WE6kmfZW8zas2IiLnNIj cnfa2yqzG9arXXQloY88JD24a1jJwa8KGsl2dZejADIbuTp29Tg+GskAd Pdju9HdsVxdehZqbeC8cgaIGeIEXxpA7uOvHk6bmaGZ4llKVa4xv0f6lp kmfismpDNsbZFwUnUbI/vHLbAn48rPukzGyRxQt6qyZo/qdbkvZE5wE1c z7nIkI07u1H9i8AQbb30SGl+PvCMXJevvxTYzTqCOU2rnFmC2l8DEHheB Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338490992" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338490992" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672826" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672826" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:13 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 12/40] s390/mm: Introduce pmd_mkwrite_kernel() Date: Sat, 18 Mar 2023 17:15:07 -0700 Message-Id: <20230319001535.23210-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: B24C2180014 X-Rspamd-Server: rspam01 X-Stat-Signature: jcbeogp41jztu5ao46fg3kdjf6a87w3o X-HE-Tag: 1679184975-400138 X-HE-Meta: U2FsdGVkX19X+1Bur5c5F0vfXobTldf/NiBLolJZSsAm5SfDtWYddfEJwZtEVA14MrCwcZTHlHLXCKMzax/oC1eb0HHWVwhykkKpu43k11hrOZXWgdCFT6JlvIpV7nNzPCu9ssBnADubIcNnYzxAkOX8GnURcZxYS0/xXX9WxyXPTw3wRzu2OZBsHBHqTK8fN7OEeviliZyLKHSJkMWABpS98EDme75qavNhtQP2H7FXqzCP5v5xLu+0dsbc1FUoD+qi9bW83zlXHnUqaF52xa23a4eLcm44ZBisiTxfMR0ThdmciYPH2hG6cP1UW+7IYpVU2xH64zjqdOxaodczpbJeunPMpPwp+gyidGtE6aXsavKAGlVkcyrdvjA3IlvQwQNp063EgbNtA55yn7LrFSEULdtdSzuFeuA+xoleDZuA3fZKWyYU49jfwJ/L/iX880HhDIaGZWvS1CXgGyTdITpfcwWLNG3bRIz6IGN9xuHjqUV/q1DhyExSnImmsNQPrYGR7V8d/E4c4LSO1cc56x8yz9S9fSEANAJfwgR4IgD7Q1PKovNkvmw7H/49Q2nle3s7MHRkvrlOpgWfV7j6pg7lNVu7ajcLTopczhjM5j6S/rhW9NTEkgu/5jJZF2XJ7+5GeGsm7D0wjIwFKDZiw7OYSbu9l8ssOnGN5aWOV5R0N8RCvg16HoN1T3tU0Cpel18+WWr1oL9v6QJ1ghShe4wCY31ez/GSYtCd4gsmOyVXi+TBDe8/HUUv978mOv+ut3XgC1Vy7xo3i9JiaE3XFCQl6DgcmCE4iwmQIaaaOiVd9Q6MRpNs0QBhy0WWNWTzDHq/vbiQiEKwAqbTqGuT9MFjLYAxVl3oND4SQ/A5CQE4W1n3oVpd2u3++S3MCu9bT9vaMto6HqDTZD5tas3wD8kAyNFOVl1MpzwKqTwdlyxLtx1aSZCJs02/7NV8PJXrs8HXLij9Tb1q3YcevrR pQuN1JjE Jk9NwIHHeiQxwax1nidOB+fWegOnZR800SOdTFn3a1N/Maaf2itOyXOcRQVzAkKWN1OYc3lTUAMt7SmD7JXTkVvAzx9IpS/dw1QMH4gdA+khY9BQOYbE1asT87KzR8HvAr8kb739kaa7YhYzEU02H8yjjHtqkS6VYfQPkCOc1V+AXclgintcKqGbd2boWnt0p2w+kvNGPN/UrecdpFA96sptBZzzkqOsbk5t8g5aXeGNAS8xXC5mLUQgOrmdiF6/ClY9JXKKMxZKMNJTNUdNofnzo3NQCGKTdIR18noC2LRUsvGknxyJgOm5zcXVIq0ZXr8PIOypgGtKDVLEBA8KQ/V2E5Tcz81mw0Omjme3nDjoqUA4wBa7gDcsu4W4WqLLzihUtqw0asFoKTmNP9HeBAv1j6qX9tgo0+vTns9aJODjjbuKSl1XtMRgD8rwiYV9evb4Bwqm+udLAWoRktTmgUGD0CqeMd6XFvTo41FlBPDi/zC0dtUbBqD2WTg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these changes is to allow for pmd_mkwrite() to create different types of writable memory (the existing conventionally writable type and also the new shadow stack type). Future patches will convert pmd_mkwrite() to take a VMA in order to facilitate this, however there are places in the kernel where pmd_mkwrite() is called outside of the context of a VMA. These are for kernel memory. So create a new variant called pmd_mkwrite_kernel() and switch the kernel users over to it. Have pmd_mkwrite() and pmd_mkwrite_kernel() be the same for now. Future patches will introduce changes to make pmd_mkwrite() take a VMA. Only do this for architectures that need it because they call pmd_mkwrite() in arch code without an associated VMA. Since it will only currently be used in arch code, so do not include it in arch_pgtable_helpers.rst. Suggested-by: David Hildenbrand Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: Heiko Carstens Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ --- Hi Non-x86 Arch’s, x86 has a feature that allows for the creation of a special type of writable memory (shadow stack) that is only writable in limited specific ways. Previously, changes were proposed to core MM code to teach it to decide when to create normally writable memory or the special shadow stack writable memory, but David Hildenbrand suggested[0] to change pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be moved into x86 code. Since pXX_mkwrite() is defined in every arch, it requires some tree-wide changes. So that is why you are seeing some patches out of a big x86 series pop up in your arch mailing list. There is no functional change. After this refactor, the shadow stack series goes on to use the arch helpers to push shadow stack memory details inside arch/x86. Testing was just 0-day build testing. Hopefully that is enough context. Thanks! [0] https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ v6: - New patch --- arch/s390/include/asm/pgtable.h | 7 ++++++- arch/s390/mm/pageattr.c | 2 +- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index d4943f2d3f00..deeb918cae1d 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1491,7 +1491,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) return set_pmd_bit(pmd, __pgprot(_SEGMENT_ENTRY_PROTECT)); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite_kernel(pmd_t pmd) { pmd = set_pmd_bit(pmd, __pgprot(_SEGMENT_ENTRY_WRITE)); if (pmd_val(pmd) & _SEGMENT_ENTRY_DIRTY) @@ -1499,6 +1499,11 @@ static inline pmd_t pmd_mkwrite(pmd_t pmd) return pmd; } +static inline pmd_t pmd_mkwrite(pmd_t pmd) +{ + return pmd_mkwrite_kernel(pmd); +} + static inline pmd_t pmd_mkclean(pmd_t pmd) { pmd = clear_pmd_bit(pmd, __pgprot(_SEGMENT_ENTRY_DIRTY)); diff --git a/arch/s390/mm/pageattr.c b/arch/s390/mm/pageattr.c index 4ee5fe5caa23..7b6967dfacd0 100644 --- a/arch/s390/mm/pageattr.c +++ b/arch/s390/mm/pageattr.c @@ -146,7 +146,7 @@ static void modify_pmd_page(pmd_t *pmdp, unsigned long addr, if (flags & SET_MEMORY_RO) new = pmd_wrprotect(new); else if (flags & SET_MEMORY_RW) - new = pmd_mkwrite(pmd_mkdirty(new)); + new = pmd_mkwrite_kernel(pmd_mkdirty(new)); if (flags & SET_MEMORY_NX) new = set_pmd_bit(new, __pgprot(_SEGMENT_ENTRY_NOEXEC)); else if (flags & SET_MEMORY_X) From patchwork Sun Mar 19 00:15:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38F09C76196 for ; Sun, 19 Mar 2023 00:16:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8A7828000C; Sat, 18 Mar 2023 20:16:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 812D3280001; Sat, 18 Mar 2023 20:16:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3585A28000B; Sat, 18 Mar 2023 20:16:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1305F280001 for ; Sat, 18 Mar 2023 20:16:22 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D4CE940C8B for ; Sun, 19 Mar 2023 00:16:21 +0000 (UTC) X-FDA: 80583730962.27.AC56E88 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf07.hostedemail.com (Postfix) with ESMTP id 6A0AC40013 for ; Sun, 19 Mar 2023 00:16:19 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Sgz2oTZi; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184979; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TSsXlf43k9VOYGKYYG+E2tqghej9414fDj7SiRnCWD4=; b=wmUhLqTPmKZOpsSYiEx6IumqN8sabL7nEtPBXHz4qimb0DA7N69qzllc4YtN2s0wtUzicT cmEG2z20e6r0NKNhh6PNoebtC/7pEP7FxuGnxsOs7wplneI1UtFX5Axi/Wc0vhhbRiFfDA 9lGU0eweF2BoUXeCw0YAJKDvcLNPsps= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Sgz2oTZi; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184979; a=rsa-sha256; cv=none; b=o5Gt0QN0mpRW9g6rGbTGXtlSPHfyi93RdKnKpe7vBeHFeJHuinEVvN+V2PI8Zv2XutMxno GqQfHXzBc6AzXZeibRkPtucI5TDBMe/IOQTj9MQ8Mw4PXOIdzKls7ymxrAeoDY6fa6rnIV 3d736NvVohBHvpxeiLyXZYXvQR59FZ8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184979; x=1710720979; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nB4ZdTNkaq5+69bBda9vM9LpVqdgiauUG+V0FTBn/HQ=; b=Sgz2oTZiT/nxHh8nrUamFJV0IODpIX050r/s1a1CAD4TFWQyC5i2pqXW twkXGfpldZYZaib4/yjCDEVEUAojls7FtVGS5iSqVBdpdSIh76Rf+OFup KcA0wP3jDITivUpi3tfacObpNePUWkwhO7Nrz6iEOlUm0B/DcBwM3qiOm 4CYAXxrUkX18/fXXJnx/k8UsFhqCtZnhggDJHtPyWuzpnwuJLPaZTrRO5 xwAChZxOfSQUoR4WB5GuUY0AonkGs4Kvm8B34HAqXFvoIL4oJhfFm+dlO HeaBWcW+CbYnhzWwDWgme2N9Sbdk+QtM+clKmHMohKDRZwQDVKE1y1nqA Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491017" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491017" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672833" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672833" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:14 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 13/40] mm: Make pte_mkwrite() take a VMA Date: Sat, 18 Mar 2023 17:15:08 -0700 Message-Id: <20230319001535.23210-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 6A0AC40013 X-Stat-Signature: 13z7n1946t6j7asxnqocop6p3kx84uah X-HE-Tag: 1679184979-649643 X-HE-Meta: U2FsdGVkX1+LTfLLgkn4Irbq6U227+kCsQEDikuSOE5GoCtYIvMiZSzLFK8CPo2QxIEjXzvSdi9S3SpHWTv477aAyJtOhlWj2E9BxBVS4IpCDyXbt5wPBtbHdBG5U14Kt9ItDgdq7y7E4D2C9dK3mE1H4OK93xQRp0Iik2auaFyYqDFoGupga3vgAMugdklGaNaJ0+4APL7OdNHsNpBLD95SSKWApVzOfNIKoVh1I/Iv8gwDVIW/6qYB44lCHU3qNoA70mbt7kAvZ/fYX54RC/uRD2kPO8CldhuCot7Qo4m6ueOv9mSJ+GRkmlk352LGDtofDxsbfiP1t5Ucvr9sSt9Hu5fVtW4ykllN0khuJBaURdc2ELKTUnE8bcBHcaLm1M1V23e9RVKV1DVLdV5ttFuzpaiewhkewr7zu7aGZhamXYBXrLPyJjQF7Vufnu6YbMAOzZ0Cg22vz52oiie8n7wvJD/zAqKs/AS7qoH9pPINB/F/xY3aHVZVMq/uP0KHOMnuHbcaOLZpXv7LF99NbX01XwxRGhUxMBRDcYs+uhtwzOw9SUE6UZ9wnmDnsU9jlBMdnjUI6FcUb3AQqKBZnWXHV4VYFlQvT1syIaAQhnJXec0XqmynizrtHbx/AUnFi2q5cSx/XeVjkx7BhIyHf9hErTYUS/vU90X2kMmtyJ1j+eBr3OHGmqaylTIDi5B014zOJY93oDLZcZett06uihQkbzlo6aenH6Q11fh5GFG+jP3V0qJ9mhEtMNnojQ1YaCuCHugjsznPokfAVqEELJ5ItgxiLcmxcBsW0g+UDLxal+QF24QJwouMZC8+IlAW7gfNL7/Q0R3zYJiCCoum6XqHX2fNMDjeEPA4N+d9/GKOGy70HbtQ2QEUxIYtjtlKy6NbtySmLbGKObsjENG5//Q9YKWTVTy21DBN0JEWvfeEp5En1pzlOjpBvFiuGtN23rek9rDXuOOEcScU1TW bDAAx0L6 4QfLZSHsdXPXZkjxaQsjmppppnE8po8lCfzQ1g04szFkqbH/lnIVe0wDrDYxY3//0rm9fIxRB1kIWQNTuFVa2zyBjSi+aciElshbtukXLmG3JDiNmRZdxM/E7nDjhdVMSgVyeSqAboC1ea0KFg/Zz5ot5RQmYX63a8W98WBnl8APe4XxHTroC29ViYCjZFyIiiJwYv3tzbLhGKQ3NHqSNXAbzWqrg5s2QGH4SCE+7TZgmhC5CC3yjGjgBeeY7Zcx/DO6y8l9a9g44QxLcqS+J51ETFWfIjGBF0eu5MlbrYrHxFswKqC16/Ud1uEeucBwy54MKtxs8dm6V9MlqUqHz73zqoRnvAUgCbOY+8OQSx+pCn+f229sBg1s0APUBCHTETE/DefjqzRvEd8n4ot/1MZxjiIkTWNYFe/+4eonEsNcFOLOAm4zE6B9h6XGcbmwENXc+dq0ve+tOuQ4q9dhNI2prKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One of these unusual properties is that shadow stack memory is writable, but only in limited ways. These limits are applied via a specific PTE bit combination. Nevertheless, the memory is writable, and core mm code will need to apply the writable permissions in the typical paths that call pte_mkwrite(). In addition to VM_WRITE, the shadow stack VMA's will have a flag denoting that they are special shadow stack flavor of writable memory. So make pte_mkwrite() take a VMA, so that the x86 implementation of it can know to create regular writable memory or shadow stack memory. Apply the same changes for pmd_mkwrite() and huge_pte_mkwrite(). No functional change. Suggested-by: David Hildenbrand Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: Michael Ellerman Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Link: https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ --- Hi Non-x86 Arch’s, x86 has a feature that allows for the creation of a special type of writable memory (shadow stack) that is only writable in limited specific ways. Previously, changes were proposed to core MM code to teach it to decide when to create normally writable memory or the special shadow stack writable memory, but David Hildenbrand suggested[0] to change pXX_mkwrite() to take a VMA, so awareness of shadow stack memory can be moved into x86 code. Since pXX_mkwrite() is defined in every arch, it requires some tree-wide changes. So that is why you are seeing some patches out of a big x86 series pop up in your arch mailing list. There is no functional change. After this refactor, the shadow stack series goes on to use the arch helpers to push shadow stack memory details inside arch/x86. Testing was just 0-day build testing. Hopefully that is enough context. Thanks! [0] https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e4037b0@redhat.com/ v6: - New patch --- Documentation/mm/arch_pgtable_helpers.rst | 9 ++++++--- arch/alpha/include/asm/pgtable.h | 6 +++++- arch/arc/include/asm/hugepage.h | 2 +- arch/arc/include/asm/pgtable-bits-arcv2.h | 7 ++++++- arch/arm/include/asm/pgtable-3level.h | 7 ++++++- arch/arm/include/asm/pgtable.h | 2 +- arch/arm64/include/asm/pgtable.h | 4 ++-- arch/csky/include/asm/pgtable.h | 2 +- arch/hexagon/include/asm/pgtable.h | 2 +- arch/ia64/include/asm/pgtable.h | 2 +- arch/loongarch/include/asm/pgtable.h | 4 ++-- arch/m68k/include/asm/mcf_pgtable.h | 2 +- arch/m68k/include/asm/motorola_pgtable.h | 6 +++++- arch/m68k/include/asm/sun3_pgtable.h | 6 +++++- arch/microblaze/include/asm/pgtable.h | 2 +- arch/mips/include/asm/pgtable.h | 6 +++--- arch/nios2/include/asm/pgtable.h | 2 +- arch/openrisc/include/asm/pgtable.h | 2 +- arch/parisc/include/asm/pgtable.h | 6 +++++- arch/powerpc/include/asm/book3s/32/pgtable.h | 2 +- arch/powerpc/include/asm/book3s/64/pgtable.h | 4 ++-- arch/powerpc/include/asm/nohash/32/pgtable.h | 2 +- arch/powerpc/include/asm/nohash/32/pte-8xx.h | 2 +- arch/powerpc/include/asm/nohash/64/pgtable.h | 2 +- arch/riscv/include/asm/pgtable.h | 6 +++--- arch/s390/include/asm/hugetlb.h | 4 ++-- arch/s390/include/asm/pgtable.h | 4 ++-- arch/sh/include/asm/pgtable_32.h | 10 ++++++++-- arch/sparc/include/asm/pgtable_32.h | 2 +- arch/sparc/include/asm/pgtable_64.h | 6 +++--- arch/um/include/asm/pgtable.h | 2 +- arch/x86/include/asm/pgtable.h | 6 ++++-- arch/xtensa/include/asm/pgtable.h | 2 +- include/asm-generic/hugetlb.h | 4 ++-- include/linux/mm.h | 2 +- mm/debug_vm_pgtable.c | 16 ++++++++-------- mm/huge_memory.c | 6 +++--- mm/hugetlb.c | 4 ++-- mm/memory.c | 4 ++-- mm/migrate_device.c | 2 +- mm/mprotect.c | 2 +- mm/userfaultfd.c | 2 +- 42 files changed, 106 insertions(+), 69 deletions(-) diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst index 30d9a09f01f4..78ac3ff2fe1d 100644 --- a/Documentation/mm/arch_pgtable_helpers.rst +++ b/Documentation/mm/arch_pgtable_helpers.rst @@ -46,7 +46,8 @@ PTE Page Table Helpers +---------------------------+--------------------------------------------------+ | pte_mkclean | Creates a clean PTE | +---------------------------+--------------------------------------------------+ -| pte_mkwrite | Creates a writable PTE | +| pte_mkwrite | Creates a writable PTE of the type specified by | +| | the VMA. | +---------------------------+--------------------------------------------------+ | pte_wrprotect | Creates a write protected PTE | +---------------------------+--------------------------------------------------+ @@ -118,7 +119,8 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_mkclean | Creates a clean PMD | +---------------------------+--------------------------------------------------+ -| pmd_mkwrite | Creates a writable PMD | +| pmd_mkwrite | Creates a writable PMD of the type specified by | +| | the VMA. | +---------------------------+--------------------------------------------------+ | pmd_wrprotect | Creates a write protected PMD | +---------------------------+--------------------------------------------------+ @@ -222,7 +224,8 @@ HugeTLB Page Table Helpers +---------------------------+--------------------------------------------------+ | huge_pte_mkdirty | Creates a dirty HugeTLB | +---------------------------+--------------------------------------------------+ -| huge_pte_mkwrite | Creates a writable HugeTLB | +| huge_pte_mkwrite | Creates a writable HugeTLB of the type specified | +| | by the VMA. | +---------------------------+--------------------------------------------------+ | huge_pte_wrprotect | Creates a write protected HugeTLB | +---------------------------+--------------------------------------------------+ diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h index ba43cb841d19..fb5d207c2a89 100644 --- a/arch/alpha/include/asm/pgtable.h +++ b/arch/alpha/include/asm/pgtable.h @@ -256,9 +256,13 @@ extern inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; extern inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_FOW; return pte; } extern inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~(__DIRTY_BITS); return pte; } extern inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~(__ACCESS_BITS); return pte; } -extern inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) &= ~_PAGE_FOW; return pte; } extern inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= __DIRTY_BITS; return pte; } extern inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= __ACCESS_BITS; return pte; } +extern inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + pte_val(pte) &= ~_PAGE_FOW; + return pte; +} /* * The smp_rmb() in the following functions are required to order the load of diff --git a/arch/arc/include/asm/hugepage.h b/arch/arc/include/asm/hugepage.h index 5001b796fb8d..223a96967188 100644 --- a/arch/arc/include/asm/hugepage.h +++ b/arch/arc/include/asm/hugepage.h @@ -21,7 +21,7 @@ static inline pmd_t pte_pmd(pte_t pte) } #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) -#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkwrite(pmd, vma) pte_pmd(pte_mkwrite(pmd_pte(pmd), (vma))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h index 6e9f8ca6d6a1..a5b8bc955015 100644 --- a/arch/arc/include/asm/pgtable-bits-arcv2.h +++ b/arch/arc/include/asm/pgtable-bits-arcv2.h @@ -87,7 +87,6 @@ PTE_BIT_FUNC(mknotpresent, &= ~(_PAGE_PRESENT)); PTE_BIT_FUNC(wrprotect, &= ~(_PAGE_WRITE)); -PTE_BIT_FUNC(mkwrite, |= (_PAGE_WRITE)); PTE_BIT_FUNC(mkclean, &= ~(_PAGE_DIRTY)); PTE_BIT_FUNC(mkdirty, |= (_PAGE_DIRTY)); PTE_BIT_FUNC(mkold, &= ~(_PAGE_ACCESSED)); @@ -95,6 +94,12 @@ PTE_BIT_FUNC(mkyoung, |= (_PAGE_ACCESSED)); PTE_BIT_FUNC(mkspecial, |= (_PAGE_SPECIAL)); PTE_BIT_FUNC(mkhuge, |= (_PAGE_HW_SZ)); +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + pte_val(pte) |= (_PAGE_WRITE); + return pte; +} + static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot)); diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 106049791500..df071a807610 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -202,11 +202,16 @@ static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; } PMD_BIT_FUNC(wrprotect, |= L_PMD_SECT_RDONLY); PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF); -PMD_BIT_FUNC(mkwrite, &= ~L_PMD_SECT_RDONLY); PMD_BIT_FUNC(mkdirty, |= L_PMD_SECT_DIRTY); PMD_BIT_FUNC(mkclean, &= ~L_PMD_SECT_DIRTY); PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF); +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + pmd_val(pmd) |= L_PMD_SECT_RDONLY; + return pmd; +} + #define pmd_mkhuge(pmd) (__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT)) #define pmd_pfn(pmd) (((pmd_val(pmd) & PMD_MASK) & PHYS_MASK) >> PAGE_SHIFT) diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index a58ccbb406ad..39ad1ae1308d 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -227,7 +227,7 @@ static inline pte_t pte_wrprotect(pte_t pte) return set_pte_bit(pte, __pgprot(L_PTE_RDONLY)); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return clear_pte_bit(pte, __pgprot(L_PTE_RDONLY)); } diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index cccf8885792e..913bf370f74a 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -187,7 +187,7 @@ static inline pte_t pte_mkwrite_kernel(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return pte_mkwrite_kernel(pte); } @@ -492,7 +492,7 @@ static inline int pmd_trans_huge(pmd_t pmd) #define pmd_cont(pmd) pte_cont(pmd_pte(pmd)) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) -#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkwrite(pmd, vma) pte_pmd(pte_mkwrite(pmd_pte(pmd), (vma))) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h index d4042495febc..c2f92c991e37 100644 --- a/arch/csky/include/asm/pgtable.h +++ b/arch/csky/include/asm/pgtable.h @@ -176,7 +176,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { pte_val(pte) |= _PAGE_WRITE; if (pte_val(pte) & _PAGE_MODIFIED) diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h index 59393613d086..14ab9c789c0e 100644 --- a/arch/hexagon/include/asm/pgtable.h +++ b/arch/hexagon/include/asm/pgtable.h @@ -300,7 +300,7 @@ static inline pte_t pte_wrprotect(pte_t pte) } /* pte_mkwrite - mark page as writable */ -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { pte_val(pte) |= _PAGE_WRITE; return pte; diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h index 21c97e31a28a..f879dd626da6 100644 --- a/arch/ia64/include/asm/pgtable.h +++ b/arch/ia64/include/asm/pgtable.h @@ -268,7 +268,7 @@ ia64_phys_addr_valid (unsigned long addr) * access rights: */ #define pte_wrprotect(pte) (__pte(pte_val(pte) & ~_PAGE_AR_RW)) -#define pte_mkwrite(pte) (__pte(pte_val(pte) | _PAGE_AR_RW)) +#define pte_mkwrite(pte, vma) (__pte(pte_val(pte) | _PAGE_AR_RW)) #define pte_mkold(pte) (__pte(pte_val(pte) & ~_PAGE_A)) #define pte_mkyoung(pte) (__pte(pte_val(pte) | _PAGE_A)) #define pte_mkclean(pte) (__pte(pte_val(pte) & ~_PAGE_D)) diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h index d28fb9dbec59..ebf645f40298 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -390,7 +390,7 @@ static inline pte_t pte_mkdirty(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { pte_val(pte) |= _PAGE_WRITE; if (pte_val(pte) & _PAGE_MODIFIED) @@ -490,7 +490,7 @@ static inline int pmd_write(pmd_t pmd) return !!(pmd_val(pmd) & _PAGE_WRITE); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { pmd_val(pmd) |= _PAGE_WRITE; if (pmd_val(pmd) & _PAGE_MODIFIED) diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h index 13741c1245e1..37d77e055016 100644 --- a/arch/m68k/include/asm/mcf_pgtable.h +++ b/arch/m68k/include/asm/mcf_pgtable.h @@ -211,7 +211,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { pte_val(pte) |= CF_PAGE_WRITABLE; return pte; diff --git a/arch/m68k/include/asm/motorola_pgtable.h b/arch/m68k/include/asm/motorola_pgtable.h index ec0dc19ab834..c4e8eb76286d 100644 --- a/arch/m68k/include/asm/motorola_pgtable.h +++ b/arch/m68k/include/asm/motorola_pgtable.h @@ -155,7 +155,6 @@ static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_RONLY; return pte; } static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~_PAGE_DIRTY; return pte; } static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~_PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) &= ~_PAGE_RONLY; return pte; } static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mknocache(pte_t pte) @@ -168,6 +167,11 @@ static inline pte_t pte_mkcache(pte_t pte) pte_val(pte) = (pte_val(pte) & _CACHEMASK040) | m68k_supervisor_cachemode; return pte; } +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + pte_val(pte) &= ~_PAGE_RONLY; + return pte; +} #define swapper_pg_dir kernel_pg_dir extern pgd_t kernel_pg_dir[128]; diff --git a/arch/m68k/include/asm/sun3_pgtable.h b/arch/m68k/include/asm/sun3_pgtable.h index e582b0484a55..2a06bea51a1e 100644 --- a/arch/m68k/include/asm/sun3_pgtable.h +++ b/arch/m68k/include/asm/sun3_pgtable.h @@ -143,10 +143,14 @@ static inline int pte_young(pte_t pte) { return pte_val(pte) & SUN3_PAGE_ACCESS static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_WRITEABLE; return pte; } static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_MODIFIED; return pte; } static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) |= SUN3_PAGE_WRITEABLE; return pte; } static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= SUN3_PAGE_MODIFIED; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= SUN3_PAGE_ACCESSED; return pte; } static inline pte_t pte_mknocache(pte_t pte) { pte_val(pte) |= SUN3_PAGE_NOCACHE; return pte; } +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + pte_val(pte) |= SUN3_PAGE_WRITEABLE; + return pte; +} // use this version when caches work... //static inline pte_t pte_mkcache(pte_t pte) { pte_val(pte) &= SUN3_PAGE_NOCACHE; return pte; } // until then, use: diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h index d1b8272abcd9..5b83e82f8d7e 100644 --- a/arch/microblaze/include/asm/pgtable.h +++ b/arch/microblaze/include/asm/pgtable.h @@ -266,7 +266,7 @@ static inline pte_t pte_mkread(pte_t pte) \ { pte_val(pte) |= _PAGE_USER; return pte; } static inline pte_t pte_mkexec(pte_t pte) \ { pte_val(pte) |= _PAGE_USER | _PAGE_EXEC; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) \ +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) \ { pte_val(pte) |= _PAGE_RW; return pte; } static inline pte_t pte_mkdirty(pte_t pte) \ { pte_val(pte) |= _PAGE_DIRTY; return pte; } diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 791389bf3c12..06efd567144a 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -309,7 +309,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { pte.pte_low |= _PAGE_WRITE; if (pte.pte_low & _PAGE_MODIFIED) { @@ -364,7 +364,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { pte_val(pte) |= _PAGE_WRITE; if (pte_val(pte) & _PAGE_MODIFIED) @@ -626,7 +626,7 @@ static inline pmd_t pmd_wrprotect(pmd_t pmd) return pmd; } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { pmd_val(pmd) |= _PAGE_WRITE; if (pmd_val(pmd) & _PAGE_MODIFIED) diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h index 0f5c2564e9f5..edd458518e0e 100644 --- a/arch/nios2/include/asm/pgtable.h +++ b/arch/nios2/include/asm/pgtable.h @@ -129,7 +129,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { pte_val(pte) |= _PAGE_WRITE; return pte; diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h index 3eb9b9555d0d..fd40aec189d1 100644 --- a/arch/openrisc/include/asm/pgtable.h +++ b/arch/openrisc/include/asm/pgtable.h @@ -250,7 +250,7 @@ static inline pte_t pte_mkold(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { pte_val(pte) |= _PAGE_WRITE; return pte; diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h index e2950f5db7c9..89f62137e67f 100644 --- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -331,8 +331,12 @@ static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~_PAGE_ACCESSED; retu static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) &= ~_PAGE_WRITE; return pte; } static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; } static inline pte_t pte_mkspecial(pte_t pte) { pte_val(pte) |= _PAGE_SPECIAL; return pte; } +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + pte_val(pte) |= _PAGE_WRITE; + return pte; +} /* * Huge pte definitions. diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 7bf1fe7297c6..10d9a1d2aca9 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -498,7 +498,7 @@ static inline pte_t pte_mkpte(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return __pte(pte_val(pte) | _PAGE_RW); } diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 4acc9690f599..be0636522d36 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -600,7 +600,7 @@ static inline pte_t pte_mkexec(pte_t pte) return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC)); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { /* * write implies read, hence set both @@ -1071,7 +1071,7 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) -#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkwrite(pmd, vma) pte_pmd(pte_mkwrite(pmd_pte(pmd), (vma))) #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY #define pmd_soft_dirty(pmd) pte_soft_dirty(pmd_pte(pmd)) diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h index fec56d965f00..7bfbcb9ba55b 100644 --- a/arch/powerpc/include/asm/nohash/32/pgtable.h +++ b/arch/powerpc/include/asm/nohash/32/pgtable.h @@ -171,7 +171,7 @@ void unmap_kernel_page(unsigned long va); do { pte_update(mm, addr, ptep, ~0, 0, 0); } while (0) #ifndef pte_mkwrite -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return __pte(pte_val(pte) | _PAGE_RW); } diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h b/arch/powerpc/include/asm/nohash/32/pte-8xx.h index 1a89ebdc3acc..f32450eb270a 100644 --- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h +++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h @@ -101,7 +101,7 @@ static inline int pte_write(pte_t pte) #define pte_write pte_write -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return __pte(pte_val(pte) & ~_PAGE_RO); } diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h b/arch/powerpc/include/asm/nohash/64/pgtable.h index 287e25864ffa..589009555877 100644 --- a/arch/powerpc/include/asm/nohash/64/pgtable.h +++ b/arch/powerpc/include/asm/nohash/64/pgtable.h @@ -85,7 +85,7 @@ #ifndef __ASSEMBLY__ /* pte_clear moved to later in this file */ -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return __pte(pte_val(pte) | _PAGE_RW); } diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index ab05f892d317..93de938f44ec 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -338,7 +338,7 @@ static inline pte_t pte_wrprotect(pte_t pte) /* static inline pte_t pte_mkread(pte_t pte) */ -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return __pte(pte_val(pte) | _PAGE_WRITE); } @@ -624,9 +624,9 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pte_pmd(pte_mkyoung(pmd_pte(pmd))); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { - return pte_pmd(pte_mkwrite(pmd_pte(pmd))); + return pte_pmd(pte_mkwrite(pmd_pte(pmd), vma)); } static inline pmd_t pmd_wrprotect(pmd_t pmd) diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h index ccdbccfde148..558f7eef9c4d 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -102,9 +102,9 @@ static inline int huge_pte_dirty(pte_t pte) return pte_dirty(pte); } -static inline pte_t huge_pte_mkwrite(pte_t pte) +static inline pte_t huge_pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { - return pte_mkwrite(pte); + return pte_mkwrite(pte, vma); } static inline pte_t huge_pte_mkdirty(pte_t pte) diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index deeb918cae1d..8f2c743da0eb 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1013,7 +1013,7 @@ static inline pte_t pte_mkwrite_kernel(pte_t pte) return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return pte_mkwrite_kernel(pte); } @@ -1499,7 +1499,7 @@ static inline pmd_t pmd_mkwrite_kernel(pmd_t pmd) return pmd; } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { return pmd_mkwrite_kernel(pmd); } diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h index 21952b094650..9f2dcb9eafc8 100644 --- a/arch/sh/include/asm/pgtable_32.h +++ b/arch/sh/include/asm/pgtable_32.h @@ -351,6 +351,12 @@ static inline void set_pte(pte_t *ptep, pte_t pte) #define PTE_BIT_FUNC(h,fn,op) \ static inline pte_t pte_##fn(pte_t pte) { pte.pte_##h op; return pte; } +#define PTE_BIT_FUNC_VMA(h,fn,op) \ +static inline pte_t pte_##fn(pte_t pte, struct vm_area_struct *vma) \ +{ \ + pte.pte_##h op; \ + return pte; \ +} #ifdef CONFIG_X2TLB /* @@ -359,11 +365,11 @@ static inline pte_t pte_##fn(pte_t pte) { pte.pte_##h op; return pte; } * kernel permissions), we attempt to couple them a bit more sanely here. */ PTE_BIT_FUNC(high, wrprotect, &= ~(_PAGE_EXT_USER_WRITE | _PAGE_EXT_KERN_WRITE)); -PTE_BIT_FUNC(high, mkwrite, |= _PAGE_EXT_USER_WRITE | _PAGE_EXT_KERN_WRITE); +PTE_BIT_FUNC_VMA(high, mkwrite, |= _PAGE_EXT_USER_WRITE | _PAGE_EXT_KERN_WRITE); PTE_BIT_FUNC(high, mkhuge, |= _PAGE_SZHUGE); #else PTE_BIT_FUNC(low, wrprotect, &= ~_PAGE_RW); -PTE_BIT_FUNC(low, mkwrite, |= _PAGE_RW); +PTE_BIT_FUNC_VMA(low, mkwrite, |= _PAGE_RW); PTE_BIT_FUNC(low, mkhuge, |= _PAGE_SZHUGE); #endif diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h index d4330e3c57a6..3e8836179456 100644 --- a/arch/sparc/include/asm/pgtable_32.h +++ b/arch/sparc/include/asm/pgtable_32.h @@ -241,7 +241,7 @@ static inline pte_t pte_mkold(pte_t pte) return __pte(pte_val(pte) & ~SRMMU_REF); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return __pte(pte_val(pte) | SRMMU_WRITE); } diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 2dc8d4641734..c5cd5c03f557 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -466,7 +466,7 @@ static inline pte_t pte_mkclean(pte_t pte) return __pte(val); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { unsigned long val = pte_val(pte), mask; @@ -756,11 +756,11 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return __pmd(pte_val(pte)); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { pte_t pte = __pte(pmd_val(pmd)); - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); return __pmd(pte_val(pte)); } diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h index a70d1618eb35..963479c133b7 100644 --- a/arch/um/include/asm/pgtable.h +++ b/arch/um/include/asm/pgtable.h @@ -207,7 +207,7 @@ static inline pte_t pte_mkyoung(pte_t pte) return(pte); } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (unlikely(pte_get_bits(pte, _PAGE_RW))) return pte; diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 3607f2572f9e..66c514808276 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -369,7 +369,9 @@ static inline pte_t pte_mkwrite_kernel(pte_t pte) return pte_set_flags(pte, _PAGE_RW); } -static inline pte_t pte_mkwrite(pte_t pte) +struct vm_area_struct; + +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { return pte_mkwrite_kernel(pte); } @@ -470,7 +472,7 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pmd_set_flags(pmd, _PAGE_ACCESSED); } -static inline pmd_t pmd_mkwrite(pmd_t pmd) +static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { return pmd_set_flags(pmd, _PAGE_RW); } diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h index fc7a14884c6c..d72632d9c53c 100644 --- a/arch/xtensa/include/asm/pgtable.h +++ b/arch/xtensa/include/asm/pgtable.h @@ -262,7 +262,7 @@ static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } -static inline pte_t pte_mkwrite(pte_t pte) +static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { pte_val(pte) |= _PAGE_WRITABLE; return pte; } #define pgprot_noncached(prot) \ diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index d7f6335d3999..e86c830728de 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -20,9 +20,9 @@ static inline unsigned long huge_pte_dirty(pte_t pte) return pte_dirty(pte); } -static inline pte_t huge_pte_mkwrite(pte_t pte) +static inline pte_t huge_pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { - return pte_mkwrite(pte); + return pte_mkwrite(pte, vma); } #ifndef __HAVE_ARCH_HUGE_PTE_WRPROTECT diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f79667824eb..af652444fbba 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1163,7 +1163,7 @@ void free_compound_page(struct page *page); static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); return pte; } diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index af59cc7bd307..7bc5592900bc 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -109,10 +109,10 @@ static void __init pte_basic_tests(struct pgtable_debug_args *args, int idx) WARN_ON(!pte_same(pte, pte)); WARN_ON(!pte_young(pte_mkyoung(pte_mkold(pte)))); WARN_ON(!pte_dirty(pte_mkdirty(pte_mkclean(pte)))); - WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte)))); + WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte), args->vma))); WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); - WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); + WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte, args->vma)))); WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte)))); WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); } @@ -153,7 +153,7 @@ static void __init pte_advanced_tests(struct pgtable_debug_args *args) pte = pte_mkclean(pte); set_pte_at(args->mm, args->vaddr, args->ptep, pte); flush_dcache_page(page); - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, args->vma); pte = pte_mkdirty(pte); ptep_set_access_flags(args->vma, args->vaddr, args->ptep, pte, 1); pte = ptep_get(args->ptep); @@ -199,10 +199,10 @@ static void __init pmd_basic_tests(struct pgtable_debug_args *args, int idx) WARN_ON(!pmd_same(pmd, pmd)); WARN_ON(!pmd_young(pmd_mkyoung(pmd_mkold(pmd)))); WARN_ON(!pmd_dirty(pmd_mkdirty(pmd_mkclean(pmd)))); - WARN_ON(!pmd_write(pmd_mkwrite(pmd_wrprotect(pmd)))); + WARN_ON(!pmd_write(pmd_mkwrite(pmd_wrprotect(pmd), args->vma))); WARN_ON(pmd_young(pmd_mkold(pmd_mkyoung(pmd)))); WARN_ON(pmd_dirty(pmd_mkclean(pmd_mkdirty(pmd)))); - WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd)))); + WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd, args->vma)))); WARN_ON(pmd_dirty(pmd_wrprotect(pmd_mkclean(pmd)))); WARN_ON(!pmd_dirty(pmd_wrprotect(pmd_mkdirty(pmd)))); /* @@ -253,7 +253,7 @@ static void __init pmd_advanced_tests(struct pgtable_debug_args *args) pmd = pmd_mkclean(pmd); set_pmd_at(args->mm, vaddr, args->pmdp, pmd); flush_dcache_page(page); - pmd = pmd_mkwrite(pmd); + pmd = pmd_mkwrite(pmd, args->vma); pmd = pmd_mkdirty(pmd); pmdp_set_access_flags(args->vma, vaddr, args->pmdp, pmd, 1); pmd = READ_ONCE(*args->pmdp); @@ -928,8 +928,8 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) pte = mk_huge_pte(page, args->page_prot); WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte))); - WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte)))); - WARN_ON(huge_pte_write(huge_pte_wrprotect(huge_pte_mkwrite(pte)))); + WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte), args->vma))); + WARN_ON(huge_pte_write(huge_pte_wrprotect(huge_pte_mkwrite(pte, args->vma)))); #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB pte = pfn_pte(args->fixed_pmd_pfn, args->page_prot); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 4fc43859e59a..aaf815838144 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -555,7 +555,7 @@ __setup("transparent_hugepage=", setup_transparent_hugepage); pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) - pmd = pmd_mkwrite(pmd); + pmd = pmd_mkwrite(pmd, vma); return pmd; } @@ -1580,7 +1580,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) pmd = pmd_modify(oldpmd, vma->vm_page_prot); pmd = pmd_mkyoung(pmd); if (writable) - pmd = pmd_mkwrite(pmd); + pmd = pmd_mkwrite(pmd, vma); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); @@ -1926,7 +1926,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, /* See change_pte_range(). */ if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && can_change_pmd_writable(vma, addr, entry)) - entry = pmd_mkwrite(entry); + entry = pmd_mkwrite(entry, vma); ret = HPAGE_PMD_NR; set_pmd_at(mm, addr, pmd, entry); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07abcb6eb203..6af471bdcff8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4900,7 +4900,7 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, if (writable) { entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, - vma->vm_page_prot))); + vma->vm_page_prot)), vma); } else { entry = huge_pte_wrprotect(mk_huge_pte(page, vma->vm_page_prot)); @@ -4916,7 +4916,7 @@ static void set_huge_ptep_writable(struct vm_area_struct *vma, { pte_t entry; - entry = huge_pte_mkwrite(huge_pte_mkdirty(huge_ptep_get(ptep))); + entry = huge_pte_mkwrite(huge_pte_mkdirty(huge_ptep_get(ptep)), vma); if (huge_ptep_set_access_flags(vma, address, ptep, entry, 1)) update_mmu_cache(vma, address, ptep); } diff --git a/mm/memory.c b/mm/memory.c index f456f3b5049c..d0972d2d6f36 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4067,7 +4067,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(&folio->page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = pte_mkwrite(pte_mkdirty(entry), vma); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); @@ -4755,7 +4755,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) pte = pte_modify(old_pte, vma->vm_page_prot); pte = pte_mkyoung(pte); if (writable) - pte = pte_mkwrite(pte); + pte = pte_mkwrite(pte, vma); ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index d30c9de60b0d..df3f5e9d5f76 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -646,7 +646,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } entry = mk_pte(page, vma->vm_page_prot); if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = pte_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/mprotect.c b/mm/mprotect.c index 231929f119d9..2d148d82d907 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -198,7 +198,7 @@ static long change_pte_range(struct mmu_gather *tlb, if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pte_write(ptent) && can_change_pte_writable(vma, addr, ptent)) - ptent = pte_mkwrite(ptent); + ptent = pte_mkwrite(ptent, vma); ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); if (pte_needs_flush(oldpte, ptent)) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 53c3d916ff66..3db6f87c0aca 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -75,7 +75,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, if (page_in_cache && !vm_shared) writable = false; if (writable) - _dst_pte = pte_mkwrite(_dst_pte); + _dst_pte = pte_mkwrite(_dst_pte, dst_vma); if (wp_copy) _dst_pte = pte_mkuffd_wp(_dst_pte); From patchwork Sun Mar 19 00:15:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180142 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E985C74A5B for ; Sun, 19 Mar 2023 00:16:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09A6D28000B; Sat, 18 Mar 2023 20:16:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EEF8F280001; Sat, 18 Mar 2023 20:16:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C300A28000D; Sat, 18 Mar 2023 20:16:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 93CBE28000B for ; Sat, 18 Mar 2023 20:16:22 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 72F23AB0B7 for ; Sun, 19 Mar 2023 00:16:22 +0000 (UTC) X-FDA: 80583731004.22.8E5362E Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf04.hostedemail.com (Postfix) with ESMTP id 3E5CF40002 for ; Sun, 19 Mar 2023 00:16:20 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aP5cLAJo; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184980; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=KA3CNIiTNbUMsggsIlv4U6nkNQ9hLyyEzwURtly/TkM=; b=5Z4P7wa/NrSoX8YS+muOtngDFhZWmWimNMziu5SeURcCNCUtogwFy0JgTpBBAkM/C6ELGd 9O2SdbqiZNKTvnfvDNFNRYhzcx1srJw665vhJClkShbc/DVarxzMQQvRG2NTpJZt/ZN58p ocBX1kSEp7YmTR2IXvvLexFjzoW/z0s= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aP5cLAJo; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf04.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184980; a=rsa-sha256; cv=none; b=MZ/5Gk5JHPYMppwpty3logn45J0ntb/Q27TEVIarq7xZWp61Rx1ohsSKrFXCd4hnNE40I3 9GmwSshARxlPVcYrEo1khrwa5LgUe1BUeqkmn+xHbsaW5tHhpq4cVEH64be1gVBZOzvEzJ WVzitfHsVHo3xBX7a4xAEZu1KwOBwoo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184980; x=1710720980; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=8dF02s3tOW4RiD+j+9DDrpVET91W8ytsL0rmu61SgrY=; b=aP5cLAJofAEMuOnvdQDEuWB1HrrRRyx1iVaLNjIKVbK7wJJCZ+3Hjv3h iL+is8/4+wARx5G7u6FXGtmDZCdPtVPCtXQInDx8ZZZWGbdEo+ltFfJau o9xmfcxOYzF5NsU2dildPN3Rqc3Wk66H0EodW9G4bm/nF6Hs4k6t3XZss ZfSqGw/GwdArqu85sAIu6sT07+AITTHunbBUCuWvxsjJZXk0v3RpVryK/ /5ayebqldueV4yb4+lWhBNkpkG6ydIg7DIW5pnnk7QatX2DDJaBKuv3FV lCLoN7ZD4BWVRkAs+IRBRFF+fImHh9z9ULpiyEeYwEnlUoOSaJxzP6fU9 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491040" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491040" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672837" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672837" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:16 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 14/40] x86/mm: Introduce _PAGE_SAVED_DIRTY Date: Sat, 18 Mar 2023 17:15:09 -0700 Message-Id: <20230319001535.23210-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3E5CF40002 X-Stat-Signature: uuahkqzq1kxsaa8deajiix9a6t7axxxu X-HE-Tag: 1679184979-543098 X-HE-Meta: U2FsdGVkX1/qLSBrqjdAmB9XgFaa+1R4Wsu5j1ALUx3fDOmn8ZeFlR8zTgUKRMn9hO/sMglj7D1P6iVSFmzwnX6CGvuCpJTkRRKjs19q4pCAUHm8HRoNWOjkSuLPs6hRVW1zYBT2zngaaGxNrmmNMILnQJ7BNM08cSV2dJqr6HqAgPKian1gnLq/M4XuD6stg9BEMRhVfkqdT+7k2E/zJQsVHHvXY6HSJ79/4JBKoO5RtXp43MfA3d3fapuWe7U+asnijtJvM/0QOzH+0N3/Z93yHFpck+1yZsJBgp4ttW1ulN3Y+P/avg++JhL8vlH91Egb9fv1zcggXf4oKmqRmyWFlYz6ScQA6UjHZ4t8o4DLGsl/asCd9zmAjQEOZW9AzaERt+0i0axAc/a6Legm0jOCak5Gh4bKxijJ2bkF7X1tL+3U/KIzt0q7/ZW9v5xGmzLtAMUSi4CxOBGUKmZ31gw3WmSzoaAdVa1Aum3Ny79A6zoqtIf46LPjZUCtRFs5aDH3M9Lr+Bt3MWK0qngYFnXGtPcDMafP7e5kSIkHxSAOawFBLuzWnKvzmonpp4xJBXo5TMIiVPsqdBZO1tvFic3uPO0/0d6pmyFagTmkc+WqUQGhT0+vZvZJ23cftIRHtbxHYBQs+NuP4Bj4il8+aXgYSzccsm/hHoHkIwneoRVUI5R6QS+pfBeohwM9R3qUkyqjswagpvxg2NP8YBS4twDr/5YxSQdgcpzFmCXO7n+FK6H1SeKcNgAgBczJJS5/gWavi2GT50Wlcs8DsFfycOVbVArKUYL6C8H+FVG8pkRASuiYSzm0WVUrzxVst3VUxNJeB11CrUwK2JIP5mjEQvPNKUa/ROpLRg9LnJ1Eoxk0lhEaM7zh1dA663BC6kWLcGnfrjAYCD7WYmDiWAxepGt0Kr5jAcKvlUbhDSASJhH+vwiS5NsFiHoMxNmpQeR1KbjCACswwm9nUzc+wRt QTN/zXpm kYx9Ra7NJLJ6IRdzDZtVCWVqK2Fz1JA6kHVD6NAArqipNJDZwanARCuepownfG/qv0cSywvTln0Q0mDUd1StjVaHwmJ9LdDz806OPgrKAdAb68RwrQPArGlWaQZD7ojgTZGLzV3i5yi1P7TfE3ghSgjiDUvVjz7n7yFV+lSgAd0JhPI7VNZ1BGE4ynIgfS88j3ckEvG4C01dUpyQkajspxiAxiQPSHJGP/XY+Am1l2YaJm0puxg3OkiB2xpZaEAJIqgsFOQ1UN6ftsTwLQ2Crye2qlxAxNlJYJpSBYx3q9zCjQ8hN0NjvZDJeCFB4Rpa7gC5+n/P4PUHyvmG6wo1JXwSONxyy+V4PDLTw9s8E477vC9IC0j6V8szHJ0/0q2eVjbSAnxi5bKFzN8jaHQMeRjGIBQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Some OSes have a greater dependence on software available bits in PTEs than Linux. That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0,Dirty=1. So in order to support shadow stack memory, Linux should avoid creating memory with this PTE bit combination unless it intends for it to be shadow stack. The reason it's lightly used is that Dirty=1 is normally set by HW _before_ a write. A write with a Write=0 PTE would typically only generate a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1 *and* generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware which supports shadow stacks will no longer exhibit this oddity. So that leaves Write=0,Dirty=1 PTEs created in software. To avoid inadvertently created shadow stack memory, in places where Linux normally creates Write=0,Dirty=1, it can use the software-defined _PAGE_SAVED_DIRTY in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs to create Write=0,Dirty=1, it instead creates Write=0,SavedDirty=1 except for shadow stack, which is Write=0,Dirty=1. There are six bits left available to software in the 64-bit PTE after consuming a bit for _PAGE_SAVED_DIRTY. No space is consumed in 32-bit kernels because shadow stacks are not enabled there. Implement only the infrastructure for _PAGE_SAVED_DIRTY. Changes to actually begin creating _PAGE_SAVED_DIRTY PTEs will follow once other pieces are in place. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Remove trailing whitespace (dhansen, Boris) v7: - Use lightly edited comment verbiage from (David Hildenbrand) - Update commit log to reduce verbosity (David Hildenbrand) v6: - Rename _PAGE_COW to _PAGE_SAVED_DIRTY (David Hildenbrand) - Add _PAGE_SAVED_DIRTY to _PAGE_CHG_MASK v5: - Fix log, comments and whitespace (Boris) - Remove capitalization on shadow stack (Boris) --- arch/x86/include/asm/pgtable.h | 79 ++++++++++++++++++++++++++++ arch/x86/include/asm/pgtable_types.h | 50 +++++++++++++++--- arch/x86/include/asm/tlbflush.h | 3 +- 3 files changed, 123 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 66c514808276..7360783f2140 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -301,6 +301,45 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +/* + * Write protection operations can result in Dirty=1,Write=0 PTEs. But in the + * case of X86_FEATURE_USER_SHSTK, the software SavedDirty bit is used, since + * the Dirty=1,Write=0 will result in the memory being treated as shadow stack + * by the HW. So when creating dirty, write-protected memory, a software bit is + * used _PAGE_BIT_SAVED_DIRTY. The following functions pte_mksaveddirty() and + * pte_clear_saveddirty() take a conventional dirty, write-protected PTE + * (Write=0,Dirty=1) and transition it to the shadow stack compatible + * version. (Write=0,SavedDirty=1). + */ +static inline pte_t pte_mksaveddirty(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pte; + + pte = pte_clear_flags(pte, _PAGE_DIRTY); + return pte_set_flags(pte, _PAGE_SAVED_DIRTY); +} + +static inline pte_t pte_clear_saveddirty(pte_t pte) +{ + /* + * _PAGE_SAVED_DIRTY is unnecessary on !X86_FEATURE_USER_SHSTK kernels, + * since the HW dirty bit can be used without creating shadow stack + * memory. See the _PAGE_SAVED_DIRTY definition for more details. + */ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pte; + + /* + * PTE is getting copied-on-write, so it will be dirtied + * if writable, or made shadow stack if shadow stack and + * being copied on access. Set the dirty bit for both + * cases. + */ + pte = pte_set_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_SAVED_DIRTY); +} + static inline pte_t pte_wrprotect(pte_t pte) { return pte_clear_flags(pte, _PAGE_RW); @@ -420,6 +459,26 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +/* See comments above pte_mksaveddirty() */ +static inline pmd_t pmd_mksaveddirty(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pmd; + + pmd = pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_set_flags(pmd, _PAGE_SAVED_DIRTY); +} + +/* See comments above pte_mksaveddirty() */ +static inline pmd_t pmd_clear_saveddirty(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pmd; + + pmd = pmd_set_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_SAVED_DIRTY); +} + static inline pmd_t pmd_wrprotect(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_RW); @@ -491,6 +550,26 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) return native_make_pud(v & ~clear); } +/* See comments above pte_mksaveddirty() */ +static inline pud_t pud_mksaveddirty(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pud; + + pud = pud_clear_flags(pud, _PAGE_DIRTY); + return pud_set_flags(pud, _PAGE_SAVED_DIRTY); +} + +/* See comments above pte_mksaveddirty() */ +static inline pud_t pud_clear_saveddirty(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return pud; + + pud = pud_set_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_SAVED_DIRTY); +} + static inline pud_t pud_mkold(pud_t pud) { return pud_clear_flags(pud, _PAGE_ACCESSED); diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 0646ad00178b..8f266788c0d7 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -21,7 +21,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -34,6 +35,15 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +/* + * Indicates a Saved Dirty bit page. + */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit */ +#else +#define _PAGE_BIT_SAVED_DIRTY 0 +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -117,6 +127,25 @@ #define _PAGE_SOFTW4 (_AT(pteval_t, 0)) #endif +/* + * The hardware requires shadow stack to be Write=0,Dirty=1. However, + * there are valid cases where the kernel might create read-only PTEs that + * are dirty (e.g., fork(), mprotect(), uffd-wp(), soft-dirty tracking). In + * this case, the _PAGE_SAVED_DIRTY bit is used instead of the HW-dirty bit, + * to avoid creating a wrong "shadow stack" PTEs. Such PTEs have + * (Write=0,SavedDirty=1,Dirty=0) set. + * + * Note that on processors without shadow stack support, the + * _PAGE_SAVED_DIRTY remains unused. + */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +#define _PAGE_SAVED_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_SAVED_DIRTY) +#else +#define _PAGE_SAVED_DIRTY (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_SAVED_DIRTY) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* @@ -125,9 +154,9 @@ * instance, and is *not* included in this mask since * pte_modify() does modify it. */ -#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ - _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \ - _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ +#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ + _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY_BITS | \ + _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC | \ _PAGE_UFFD_WP) #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE) @@ -186,12 +215,17 @@ enum page_cache_mode { #define PAGE_READONLY __pg(__PP| 0|_USR|___A|__NX| 0| 0| 0) #define PAGE_READONLY_EXEC __pg(__PP| 0|_USR|___A| 0| 0| 0| 0) -#define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) -#define _KERNPG_TABLE_NOENC (__PP|__RW| 0|___A| 0|___D| 0| 0) -#define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) +/* + * Page tables needs to have Write=1 in order for any lower PTEs to be + * writable. This includes shadow stack memory (Write=0, Dirty=1) + */ #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) +#define _KERNPG_TABLE_NOENC (__PP|__RW| 0|___A| 0|___D| 0| 0) +#define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) + +#define __PAGE_KERNEL (__PP|__RW| 0|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_EXEC (__PP|__RW| 0|___A| 0|___D| 0|___G) #define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index cda3118f3b27..6c5ef14060a8 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -273,7 +273,8 @@ static inline bool pte_flags_need_flush(unsigned long oldflags, const pteval_t flush_on_clear = _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED; const pteval_t software_flags = _PAGE_SOFTW1 | _PAGE_SOFTW2 | - _PAGE_SOFTW3 | _PAGE_SOFTW4; + _PAGE_SOFTW3 | _PAGE_SOFTW4 | + _PAGE_SAVED_DIRTY; const pteval_t flush_on_change = _PAGE_RW | _PAGE_USER | _PAGE_PWT | _PAGE_PCD | _PAGE_PSE | _PAGE_GLOBAL | _PAGE_PAT | _PAGE_PAT_LARGE | _PAGE_PKEY_BIT0 | _PAGE_PKEY_BIT1 | From patchwork Sun Mar 19 00:15:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BB0FC761A6 for ; Sun, 19 Mar 2023 00:16:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1123C28000D; Sat, 18 Mar 2023 20:16:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 09928280001; Sat, 18 Mar 2023 20:16:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDFAD28000D; Sat, 18 Mar 2023 20:16:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C87F1280001 for ; Sat, 18 Mar 2023 20:16:23 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A3C39C0CF8 for ; Sun, 19 Mar 2023 00:16:23 +0000 (UTC) X-FDA: 80583731046.14.430830F Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id A11801C0014 for ; Sun, 19 Mar 2023 00:16:21 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fspsw9ew; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184981; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=+6GvvYu5yHulXslGrW9yeLU+ZeVo4mtLKdyIMGCutuA=; b=WDGZxzD/DDT/N4LYp/P+ZQRzs1bhFzKDOBt43KJO6XU/jV9rPTS6WvlI4qlMXD8q7c8QOd fegBoDACLyEJVYJSHz1OOcG/v4b7BxCrICK9nNki7hfvgsO8vEkEXkLEjlGD1G8E0KcjYy TcILP+93jDEvuz80IegV051W6l+dUxM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fspsw9ew; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184981; a=rsa-sha256; cv=none; b=qiIAOX4mXYUvtQAJJCFXgr5iLbpqhoXtURmMuSNlhP0JNgu3+wI4+vSa1K0lWa3TPoIa9z KQ5sy1dIcFUbCJ5NbsY41wMlvCTfYue+FWZZZ+lg0DXb1LaR0S8zgGVW1r0MZBEN4oOWJq iV4pDRbMbKFdtOLqNLv/yE7qVhj4B9I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184981; x=1710720981; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=bCew3mJcfQObTYD8BfRHQ7rk2S0psN/tr65S9eiRcMg=; b=fspsw9ewqJpGfvmQ4lqCVzDtO19/eaH0XVDnloBYxIMiUkKAIuaGQjqK dpqUTphCgFU7KtcUPZ3ACJ6m+ecjfTC3g4ALPtAnIhjQE8OeenmkeHkRp 6w+JRJKPjBTo4Y8jA2Mv1ux+mG9JL2gC4+nrf8ZY4iuX4GwzH8T4hoyYs fAtoWat4qVOr/V7sWNT1HwnO5NeTq4EhTo9iDSL+mV5yd5bFkhdvWd3tb skYTgWBNCSHjY3mZWYalThidPkfwXwXz1CC9+yI3h2NA14wMrz81B02bq zq3Y4qJl3AB0KQIz4MgCka/IBMbqmzW53c49V5VWeEJLqfEEHZBtVF2p5 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491062" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491062" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672841" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672841" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:18 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 15/40] x86/mm: Update ptep/pmdp_set_wrprotect() for _PAGE_SAVED_DIRTY Date: Sat, 18 Mar 2023 17:15:10 -0700 Message-Id: <20230319001535.23210-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A11801C0014 X-Rspam-User: X-Stat-Signature: tsecuk3a9qxp1zwfcg7m1rgx4pfyn8gf X-HE-Tag: 1679184981-811035 X-HE-Meta: U2FsdGVkX1+pMs6QdhullzC8kmaJL/E2VNrHiQqT5j8LaGWLqUhGTk4U+V9h9i3GzurGb9iQ7xAvMRtd+3YGzwtX4HPjyDZdAYfjW4uQ/HNC+Rmz4HYJxS5JjzfaJ/mr+I4SYpqCZlJgzSmNwevR4w5cixbE8Wed+vKl3b/++4r5fOWQ6c8srnqZgTAElvmvNc8mwYsppmdLlqlUPgXnhbMg1UBuRFTFHD3IOPFI9B9ttc8YA74j3RlmkyoYNUql1b0YTdnF6VbZt82d+GdYEv8P/Sivr5/cZEhcfuoNys30fSgZu2/PhaGa/kInNcVoXIU1C31BI2crup2BG83aajZfevV4ROc/j+tLhWFbLFIWhJvoqc2nyX1YMN+r/h/lLyIK4abg85/nsssd2PAMzG5V+4HRumIE86bC+cwTU0iWGhrb7i0BWDKIzkquWivNaa9UcfHbyv2bDASI2QSEc83YcHEcRZrYaUzbN6bO+jHP9ShQVvA6uNaDswNUWQnXT14gptl+kKgDnZcHjTzEECIVwnOvkriNDk4npn9RmoWH4Yu30rcSdftwETGbdaXGHZV4/dPyyVnT2rpJvjy3EZ7rxOqv5C7pCMEQCdvv2YFD8Owry964/8ob4iIrXWSvwEIiqEOEk0yHzvWcYVbD1hJf7M8YYQWsFA+F9lmah6BXM3Sih4L95omSHbvLB8aMzBsGPk5V1J1yEuqJpT0YnymJBUlN97EDUSb0UrogBlcpTIpVAtoWY4TN+tdW6mPXOmjnSFt2AmYr3biNblW//WzH94ut+Y7jUEkr8ZmwdSQ1TM5bq/oV6bVd+XK44jyrMJLtaPs97clZHHviV3NwSf58XVFdiqpLI/TTJXjDgk7WtmFXVKumX/yWExZ5qPb81Hr8swlqU8KDdObvaJ0IhDPfEWHEu/YeTqVSopwq7qj8DsmcwA330IFuvGnALznQn4Zih/IEJk28pQipzvE 0U3c5tbx tSsaddDhSWPuy4wroyh060qnP91eg/duAhjBAVd3Y0yDnVMc/HPkpg7PZmbPT+93TQ3UJtLNyVLMjqDeI7u3m8kZzqjd8aNy5VwqO0cYPLa42jGVsv7LEeNL8zFxtXHMtJe+FWjXaZ5Uch2584l1VirR3f2BjM4cFDAsgOnzSKwTskiHu88gjcMnQCzc+3xLg7NifauOS2jo2tgW55w50Q98f/xqB/83G8NHU254uRmx4WIXpwodi++ywYrcGMslHP1R65heO2PAXfw8kF9BbkuuXg+IVDcfSU954OP32DFRvAWm4rG+gsTF0u7eIacracVtMF1uEtkIxiaxB4GsgFbhO7QpzqFITGMS/cMirseaCp6K9gU4Lw4IYgB2lFX/YbTQOIpy82sihiuU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When shadow stack is in use, Write=0,Dirty=1 PTE are preserved for shadow stack. Copy-on-write PTEs then have Write=0,SavedDirty=1. When a PTE goes from Write=1,Dirty=1 to Write=0,SavedDirty=1, it could become a transient shadow stack PTE in two cases: 1. Some processors can start a write but end up seeing a Write=0 PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting shadow stack, and a TLB flush is not necessary. 2. When _PAGE_DIRTY is replaced with _PAGE_SAVED_DIRTY non-atomically, a transient shadow stack PTE can be created as a result. Thus, prevent that with cmpxchg. In the case of pmdp_set_wrprotect(), for nopmd configs the ->pmd operated on does not exist and the logic would need to be different. Although the extra functionality will normally be optimized out when user shadow stacks are not configured, also exclude it in the preprocessor stage so that it will still compile. User shadow stack is not supported there by Linux anyway. Leave the cpu_feature_enabled() check so that the functionality also gets disabled based on runtime detection of the feature. Similarly, compile it out in ptep_set_wrprotect() due to a clang warning on i386. Like above, the code path should get optimized out on i386 since shadow stack is not supported on 32 bit kernels, but this makes the compiler happy. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v6: - Fix comment and log to update for _PAGE_COW being replaced with _PAGE_SAVED_DIRTY. v5: - Commit log verbiage and formatting (Boris) - Remove capitalization on shadow stack (Boris) - Fix i386 warning on recent clang v3: - Remove unnecessary #ifdef (Dave Hansen) v2: - Compile out some code due to clang build error - Clarify commit log (dhansen) - Normalize PTE bit descriptions between patches (dhansen) - Update comment with text from (dhansen) --- arch/x86/include/asm/pgtable.h | 35 ++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 7360783f2140..349fcab0405a 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1192,6 +1192,23 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { +#ifdef CONFIG_X86_USER_SHADOW_STACK + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } +#endif clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1244,6 +1261,24 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { +#ifdef CONFIG_X86_USER_SHADOW_STACK + /* + * Avoid accidentally creating shadow stack PTEs + * (Write=0,Dirty=1). Use cmpxchg() to prevent races with + * the hardware setting Dirty=1. + */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) { + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg(&pmdp->pmd, &old_pmd.pmd, new_pmd.pmd)); + + return; + } +#endif + clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } From patchwork Sun Mar 19 00:15:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180144 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF822C74A5B for ; Sun, 19 Mar 2023 00:16:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C36F28000E; Sat, 18 Mar 2023 20:16:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04A56280001; Sat, 18 Mar 2023 20:16:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB71928000E; Sat, 18 Mar 2023 20:16:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BF2C1280001 for ; Sat, 18 Mar 2023 20:16:25 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9D58640C8B for ; Sun, 19 Mar 2023 00:16:25 +0000 (UTC) X-FDA: 80583731130.23.857AD96 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id 90DA640012 for ; Sun, 19 Mar 2023 00:16:23 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aHMgeMMK; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184983; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=yh44XlmxZ3TpyhdCF8bY5/DkNMqHR+ekMlxGg2PUBdY=; b=X18o4CMubeCSpbvYy3jnuPMUkAa9GZIgPj7roeUGbCcuO6bPD9NCnjioaQmVTJll/lZPkn LlcAIsg+aOzV407deRQzWnYhIfO/+gvScaDBckhKyyx+LN/tlKrSKynx10dbei8V/pcBCB S75V7J5sNR7e9t+jNh8nUplvB0SzPZU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=aHMgeMMK; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184983; a=rsa-sha256; cv=none; b=djWDi/UKpht0b1VkUsCjZXLSDJMQMf1kyVB6Lp3BMXkFZAlx0oYWEc50/xr2qJt1ggbp6L KqngeP2JhOpLkFNiFbxpSJVAOByy3LPk/zqHDP5J6sYrSfce1+rIW913HTd3dn2/ntDLo2 bUp6YnRD6uMu8a3pZtSuh28p7HXPpAw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184983; x=1710720983; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=V5J+b+gsGH0IqMjFO6EWI1A0U+2t/hj/jJS5+EQE7p4=; b=aHMgeMMKvr6/HW4jGs6XBli9WHsA0LXHJQ6ssb9K+bfOmHlzghoGWvOl 9VCqSf4dg0Aqd3rKJM8vQxnNDCXwHgsyJB+SwlVxHL00pvCpYPLdsnyrR 8QhMH/GxNRbPR22YBuVGf/Xk/0t+5ksv2prgvf3tcrS65x+jXbqQJ10ia x6c1CAnZ8ksOJmtaZNk8oRysqJ2AXvxICoAivhykcKcRoufBSV8kBT4gs cL3NlxQeSKeOgdgbgzpV7ZjuujDo6YlvbsUOIz5WkhtA1RoCK3LdtgQQj nn6OP/alfrBrhXaJEstYs4jWxQI45ya1dYT3PnMn5MLUB8mN/iDQQUhtL g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491084" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491084" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672845" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672845" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:20 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 16/40] x86/mm: Start actually marking _PAGE_SAVED_DIRTY Date: Sat, 18 Mar 2023 17:15:11 -0700 Message-Id: <20230319001535.23210-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 90DA640012 X-Rspam-User: X-Stat-Signature: nj5k6w5nmowgyfodh9tnw6iky1oi979d X-HE-Tag: 1679184983-668226 X-HE-Meta: U2FsdGVkX1+QmtMY2vcOodVTn9gDHDN7GpyXzpJIFnvjiTPtZwNeWLbqLAS8g6EBCZCCt83sEIHPBhGRrRSYfo9BTGjUEy2kOrqDiVR0dUSfOuryvBPF7ql0MPwKcIum255xUp40NPusYxO7rm80GPkriU/isK1MPQlb3ZxHWZL0N3uoiGTiuoy3aRwJH7gBRJC/JHCjI75eGx9+za9++xvC8M1jX9VMjYelc0goxdZ1StgsvJFZki0VJm3bvJbNx10L7yu/7c5NTe3qs0HDIQ4rwBOmtgffIknWb0KtUcf0qXhudToo4zp0/iVgcS2F9dyvnJSDv8SBJlxzHyrhIEHNPX/E8J6LBbTqMP9I4QpBq/NlWwtESclq08ud8+F9iXtZ4IWF6s/ZckbNpcvcy+H84IE5mMInHmf8GCNYqqJiTM9vUu/Ub1vBznj5pe+y/S24ImGweE5wJjizG9VAno5Y9egYJkk99ozkGngLjXDbjMBS7C7U9piWPRUxnC6LGgVj8YNwKLJjetwQe4HS34A9P0ENQe8MRON0kAFv+QNKm2/T3UdmCN8lmQz7Lna+/zLHfYnlkAj+Y7TuDUZDwF24yUHdVKJPXq4/Zm1biFMIZpgErA4Mq6ftd7JQsTdGerjEXxKK3xzaf1FC7g/pSrEqkuXT/qwO2Oil57Su2TH30YdB6bEPnpwxRETpa9Jgzj77K6RzbpppsF7d64O65/pJYyt5GNbtw+yacM76HsdESoWmPJcMVB9OoOFj1y6zb0dtNkYH/6lj2QJEqctuo/Q7TxaS2KXRnize9bdN5+3kcB1FypOTxLgSYcsv6Mxe2oDCYktNX2/durYIqoe1ZchKDDfFE80O0O5wln9N3PiLsYVTX9beuqiDFAVF1soVnE2mjeitmUx4oo8j1uI9poYmVO3qeXI9GTDvD7EZnRWUJh0oRGGM4+dJrFyg8kUJQ4rFqATMw/hv3slIC3H D3BTMi+e joOSsoLJdaKQir/XCd/J1L2LHZxgsgoID33uwU+mNo/3DjC6UqDjr709TUCOmUMIQ1RD+8hH8NFMK5eDXSYcHTdXzTnQMAv7E1hEsBRuFQyO6qMcN8iGXPV2tvZo8As5xbLlcjDgq1CXf2XvzCkf756EtkrZQcXoY0DNxOOQBVA2LWOSU/MmT/Lu1wf1CwdXymabYprWpK1b2Qd/7AU/C4ZJBQyANWp4h0Ls2yYzbW6bUMgk0pl7z/3zwSrtkhGTlrHzDsp3o110rpx7WZd4kbgoBqbS1Hk1ceX67yOaOHmyoXUDBbSXFpx5w6hNsJQDlgpj77dXcgtoxPBE+/zdPpOUbqPSkTh51j3MFzFUCoWKsIT4dvvDBkPp+kBwmm7uKDZEOe0Im1tCMcm0CYZovjohbTg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The recently introduced _PAGE_SAVED_DIRTY should be used instead of the HW Dirty bit whenever a PTE is Write=0, in order to not inadvertently create shadow stack PTEs. Update pte_mk*() helpers to do this, and apply the same changes to pmd and pud. For pte_modify() this is a bit trickier. It takes a "raw" pgprot_t which was not necessarily created with any of the existing PTE bit helpers. That means that it can return a pte_t with Write=0,Dirty=1, a shadow stack PTE, when it did not intend to create one. Modify it to also move _PAGE_DIRTY to _PAGE_SAVED_DIRTY. To avoid creating Write=0,Dirty=1 PTEs, pte_modify() needs to avoid: 1. Marking Write=0 PTEs Dirty=1 2. Marking Dirty=1 PTEs Write=0 The first case cannot happen as the existing behavior of pte_modify() is to filter out any Dirty bit passed in newprot. Handle the second case by shifting _PAGE_DIRTY=1 to _PAGE_SAVED_DIRTY=1 if the PTE was write protected by the pte_modify() call. Apply the same changes to pmd_modify(). Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v6: - Rename _PAGE_COW to _PAGE_SAVED_DIRTY (David Hildenbrand) - Open code _PAGE_SAVED_DIRTY part in pte_modify() (Boris) - Change the logic so the open coded part is not too ugly - Merge pte_modify() patch with this one because of the above v4: - Break part patch for better bisectability --- arch/x86/include/asm/pgtable.h | 168 ++++++++++++++++++++++++++++----- 1 file changed, 145 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 349fcab0405a..05dfdbdf96b4 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -124,9 +124,17 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + return pte_flags(pte) & _PAGE_DIRTY_BITS; +} + +static inline bool pte_shstk(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pte_young(pte_t pte) @@ -134,9 +142,18 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return false; + + return (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY | _PAGE_PSE)) == + (_PAGE_DIRTY | _PAGE_PSE); } #define pmd_young pmd_young @@ -145,9 +162,9 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -157,13 +174,21 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * Shadow stack pages are logically writable, but do not have + * _PAGE_RW. Check for them separately from _PAGE_RW itself. + */ + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define pud_write pud_write @@ -342,7 +367,16 @@ static inline pte_t pte_clear_saveddirty(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (Write=0,Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pte_dirty(pte)) + pte = pte_mksaveddirty(pte); + return pte; } #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP @@ -380,7 +414,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -395,7 +429,19 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pteval_t dirty = _PAGE_DIRTY; + + /* Avoid creating Dirty=1,Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pte_write(pte)) + dirty = _PAGE_SAVED_DIRTY; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + /* pte_clear_saveddirty() also sets Dirty=1 */ + return pte_clear_saveddirty(pte); } static inline pte_t pte_mkyoung(pte_t pte) @@ -412,7 +458,12 @@ struct vm_area_struct; static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { - return pte_mkwrite_kernel(pte); + pte = pte_mkwrite_kernel(pte); + + if (pte_dirty(pte)) + pte = pte_clear_saveddirty(pte); + + return pte; } static inline pte_t pte_mkhuge(pte_t pte) @@ -481,7 +532,15 @@ static inline pmd_t pmd_clear_saveddirty(pmd_t pmd) static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pmd_dirty(pmd)) + pmd = pmd_mksaveddirty(pmd); + return pmd; } #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP @@ -508,12 +567,23 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pmdval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pmd_write(pmd)) + dirty = _PAGE_SAVED_DIRTY; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd_clear_saveddirty(pmd); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -533,7 +603,12 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { - return pmd_set_flags(pmd, _PAGE_RW); + pmd = pmd_set_flags(pmd, _PAGE_RW); + + if (pmd_dirty(pmd)) + pmd = pmd_clear_saveddirty(pmd); + + return pmd; } static inline pud_t pud_set_flags(pud_t pud, pudval_t set) @@ -577,17 +652,32 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pud_dirty(pud)) + pud = pud_mksaveddirty(pud); + return pud; } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && !pud_write(pud)) + dirty = _PAGE_SAVED_DIRTY; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -607,7 +697,11 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + if (pud_dirty(pud)) + pud = pud_clear_saveddirty(pud); + return pud; } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY @@ -724,6 +818,8 @@ static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { pteval_t val = pte_val(pte), oldval = val; + bool wr_protected; + pte_t pte_result; /* * Chop off the NX bit (if present), and add the NX portion of @@ -732,17 +828,43 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) val &= _PAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); - return __pte(val); + + pte_result = __pte(val); + + /* + * Do the saveddirty fixup if the PTE was just write protected and + * it's dirty. + */ + wr_protected = (oldval & _PAGE_RW) && !(val & _PAGE_RW); + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && wr_protected && + (val & _PAGE_DIRTY)) + pte_result = pte_mksaveddirty(pte_result); + + return pte_result; } static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { pmdval_t val = pmd_val(pmd), oldval = val; + bool wr_protected; + pmd_t pmd_result; - val &= _HPAGE_CHG_MASK; + val &= (_HPAGE_CHG_MASK & ~_PAGE_DIRTY); val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); - return __pmd(val); + + pmd_result = __pmd(val); + + /* + * Do the saveddirty fixup if the PMD was just write protected and + * it's dirty. + */ + wr_protected = (oldval & _PAGE_RW) && !(val & _PAGE_RW); + if (cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && wr_protected && + (val & _PAGE_DIRTY)) + pmd_result = pmd_mksaveddirty(pmd_result); + + return pmd_result; } /* From patchwork Sun Mar 19 00:15:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44F9FC76196 for ; Sun, 19 Mar 2023 00:16:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A103628000F; Sat, 18 Mar 2023 20:16:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92381280001; Sat, 18 Mar 2023 20:16:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7755728000F; Sat, 18 Mar 2023 20:16:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5FCA7280001 for ; Sat, 18 Mar 2023 20:16:26 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3AF62140B78 for ; Sun, 19 Mar 2023 00:16:26 +0000 (UTC) X-FDA: 80583731172.16.548D897 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id 2FBA21C0012 for ; Sun, 19 Mar 2023 00:16:24 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=cq+JFaCR; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184984; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=Pwhy/M47BV1NmQ8E1swDHEJVYRiFTE7CnNQ7FqeGg+g=; b=aRjyy6gBG38uuq6oboNStWkprVOOpzd1EtFJzA5PtLVDFekeNmAvjzdRaPOf+JZ98kV82N Kh1o63pLgsyYBeJNLlvb5ubFki0WTezmRc34SdPNZfM7IYmxub+8XYnp2Hq8n4Mh2xWhsR RucGCxG0+c1KJz69vwLGPJNiiIyMNfA= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=cq+JFaCR; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184984; a=rsa-sha256; cv=none; b=ehgiguW65c5qDu5Pn4fdW4Qz2ZTKQfvRHgtv15ZPaGFI43e0anUXAelY60/YhZ/D97jWL1 mCfTym8zGfo0sxYACbY4v7NA5x7K5xiM+CujQ57XDp/LTPBJQGCcRIDitXrnIJ6bdxO9KU ofdy/4lUtyc6JAC1frbY59116rqDQ0k= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184984; x=1710720984; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Za8k1YHP/NKzL9PLFScEe9Ide1J8X/tcTz7SUA3IT3Y=; b=cq+JFaCRDbkB63G78+4EFbff8iH9Px2yc0mpjgkM3LRTzCNCNPrWORmG xPqwD/m2R1j31oesgz7yNQmFRzR3A+431ca+9s9NsBM18rbf8EjrKjnur v8Z4/j8yfngiaKLt1tvmgBf7tfYQx5FIRToToVFpGYTwH/3+gLxxvGn7q yXRIU/laLV7eEV2zNuQmyGJjV/5DKXt5s4yslxrRcS19Njgd7OAemmHV5 5uELI7EreKawLhEbReLvQZx56kZhC0XZXFbytH52jx2wfrIsBMagQHfth FhhVnNPCTIARHLy+FnBzzAvem5qe+fjMZqWw4RK46SbnSbfhrHn+gazwz w==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491107" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491107" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672849" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672849" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:21 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 17/40] mm: Move VM_UFFD_MINOR_BIT from 37 to 38 Date: Sat, 18 Mar 2023 17:15:12 -0700 Message-Id: <20230319001535.23210-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2FBA21C0012 X-Rspam-User: X-Stat-Signature: qdgxs4gqbn6neccis167coarojmhgmeo X-HE-Tag: 1679184984-383442 X-HE-Meta: U2FsdGVkX18t7CpQ3KnR+g0rr2WcvFGjKaFG+Xzb9xlHkOuygym76wBx4NY1NIxvwZYQN0t82FSd5t/RTzLE8WFHRS9fs1SaG7L5tpCunekzWciRh6RIzFKwJ0UmPwbYo238DacrAoHY4cB4lewK3a88KyUJCSSYB/NBG2vESlkPX5isHVI21X9MzeYVtvyNLfgNyf55UvS/P2NbTpKTZDJCoFF0myvbNA4SLIQBlRcfXNsN3y00p82UMq2ksltQ4OSAS+jEKUwgo4WkA/OsbjuIpYUmme7faYJhuEgfIZotCjVl0IrQ2O40R3P2tXymjD693jGrIIR4AIA3hfuj575Oh/9tzuGYVvqxtsgswfdGBEMVU3/fH6PoDTuReifai4yIx3HIKNCmbcxXrWtIA2jzsehYd+Qzz0u3t/IFm2qw/7hyfUfDw7rlQPlCDd7dqyVJurtgJKmvyHp2W4rXtm60pCNttGDcfARnN6cEK+bFPnYfwc/FdJaos5h99pC+Yx+AqAnl8YaA64mNXRGR+Ve3UcD9pAgmkDl0mp3oD59XdkRKqDbxYPu0YyJJ2IviGl2w8vwFPcDqfWv1JEos0hlE0YynXryRPMwpLUpgURNdijl8GLyBGvTz5vF8Z8qawta+JYyuzxXpCZoJyfO6VIVhtKv5/gK6+UmAO760E1oavmuIvZs5tHcBJ8Sw8u/5Knz9s3gH7fKlVIkLF5cEHFM0uPG6QNtQq+nosxUiIxiZ/a5YeN7FEJLwjAE8I3P07DxfWoBNeDlKSaGxxMMrmnZWJH/4yS8/Qanl3++CvMCg6hHik3r5dG5fl3ep2wYMI49qv5PxHd2Gf/mIIpIgldVVz0DWm0a14XnZtGMCwj+VnwMv7Fn1eYIPqoN8FMxumKxtJw3i5dgOEq/Ga7oREYKvnJeKYRHd4JgFUbaRxRRnFiRT/3aQ3kK+TOgEsPCDvkGrXGKxe1l9bjN4/lB +qwFFtM+ 416X+JNAlxncbY7DiOc7VAISugZ1f1gymynZHy20GvxfCOf/MROtsvV9SPiFJFAdZormuuN59kTGRaeB5H9gW5SwVBteVG7X8xH7NTy1/z5hX0OTnjy8qXJG42EozBxSsH9CkUKBFjfz9ZsB2bWTGMmg6748zC2ygNbFPbL2xpE74/00NG+YTDETh7GJ7E0F9URB5iK0owuLhHXWW02nWPTYk0m3ldcdgoyqRuJuemQZ9Uu+8BO3uN3XbSIzKpNJz9M7ICIrh2P4p1+eSHDOSqbVbwvipXhU9iqqFbN/UEr/VLK+aYopmvHzqBOAfQQtOsKUuBgtwnVW55reZEU73XrXeExQRYXxPorV6nAPCpSU+H8UypS83G7pMMBgFSORimRMOUnR1zXWpwVrD+JOVw6Orf0UnDwOdtrorBhYhJCqgVA0Pmn34GGhtQgmUZHAaJMic X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Future patches will introduce a new VM flag VM_SHADOW_STACK that will be VM_HIGH_ARCH_BIT_5. VM_HIGH_ARCH_BIT_1 through VM_HIGH_ARCH_BIT_4 are bits 32-36, and bit 37 is the unrelated VM_UFFD_MINOR_BIT. For the sake of order, make all VM_HIGH_ARCH_BITs stay together by moving VM_UFFD_MINOR_BIT from 37 to 38. This will allow VM_SHADOW_STACK to be introduced as 37. Co-developed-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Reviewed-by: Axel Rasmussen Acked-by: Mike Rapoport (IBM) Acked-by: Peter Xu Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Reviewed-by: David Hildenbrand --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index af652444fbba..a1b31caae013 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -377,7 +377,7 @@ extern unsigned int kobjsize(const void *objp); #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR -# define VM_UFFD_MINOR_BIT 37 +# define VM_UFFD_MINOR_BIT 38 # define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */ #else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ # define VM_UFFD_MINOR VM_NONE From patchwork Sun Mar 19 00:15:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180146 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AED46C7618A for ; Sun, 19 Mar 2023 00:16:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38124280010; Sat, 18 Mar 2023 20:16:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BD79280001; Sat, 18 Mar 2023 20:16:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 022AE280010; Sat, 18 Mar 2023 20:16:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E0A5E280001 for ; Sat, 18 Mar 2023 20:16:27 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C2B43120C75 for ; Sun, 19 Mar 2023 00:16:27 +0000 (UTC) X-FDA: 80583731214.08.92E93C3 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id CC20340012 for ; Sun, 19 Mar 2023 00:16:25 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jdNKSe32; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184986; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=Mr90LVFmE9gifcOMDGIP+CURvzz5o3WovfUcqTtHWl0=; b=hHnrOZ64c7EXNT5nnCwNqAJZajJ2rIHPU5flkPISq/LYeOiNgOYDi95CMJKbk+cqAlxfNG 8rnvLugk66K2N92qlTC7myYMRH7XMdeWNHXP/Tk4k8djw9zW7bKCWDFXCtCT6yR/oDyP5O Ub/giS8wjLRzXIl0A+QRoHhwhoH9rRA= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jdNKSe32; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184986; a=rsa-sha256; cv=none; b=vZXduC6G6nuMan8xkAoynefKeyfH1DtlAFVx0UIWlAis4ujS3a4BLMQgad/x+r2r/MToJS WSiwYhFQg7SC/tU0liIFBUQmDcBqOcjkB7u3j2JcC9tzLI2jRQMSaSyiDHqnHQu+GPYdLd 1x1o1F1sV4peKPi5W3SZbdEtd8s+zgY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184985; x=1710720985; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=zgMYcW7jhLowf3LtAX7K40lEhH7jOSIjySbX8LofR4w=; b=jdNKSe32DX4Lbm2njmNTI3jDpYbc/DX8zmQOvURzDWSxO5VW5HzHA8fu 5eekhBvCFdsphruZg2HtP6xD1LXzCMeshRu8owH/xM/2oryqpdTHvoUAA VMQQC/1GWMNO1AsoOkCDLJADBHUKYTYptLI5Tju/iFkKW5WDkTGHG1r/s oREVnuP+OQWSwpKL59eV1Q20XuITG1/8sZFJF/YWh+kJoZLLwzlRQj6/k BY4+pFUxEqB9X1bGmeOpeQeOH94pVw3UCAAsBWJOZyIGrVvzMyqkVWI1B 1tmCDtkU8+bfRkuThLZsRDTvHC2wj2eZTjDZsDJ1vIY1sDVFG9b3u3Vv7 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491129" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491129" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672855" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672855" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:23 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 18/40] mm: Introduce VM_SHADOW_STACK for shadow stack memory Date: Sat, 18 Mar 2023 17:15:13 -0700 Message-Id: <20230319001535.23210-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: CC20340012 X-Rspam-User: X-Stat-Signature: xyzaxpapr4eeaaxhib4mcqfyobdp8pit X-HE-Tag: 1679184985-566079 X-HE-Meta: U2FsdGVkX192zZp3Fbkmdi7vK4bfvduV0V8RmKtWCow6aUPc69ExGk3yssI1qUyS2VsgrePxPiFDa3OpTVTZmf8U+ZS3bKvRsjA0XCtW8oI3kEmeb1WJEjnTWkAKlgWO5WCtumPD7zw5JJGw8YwR1mPtpjCDKFmFHRgdyiy+zBi2e0SCLqMVQv+jcmaxxhpNacm6oFjhD62vDOCaLBYdE4dsPUResmEti/SjUT7RX+MX0JqpYnvHByheyXjAfujpKq0X4xasfrflQGAjcVVgbdsFExRcxvHWWkqpBi4Rpyg6UDRPw7lpvVFF56nSwXCyFyC3wbkW8YG+yG6ILZUd7uAjfQLzJwhgFbsmGRU1hnFI8zOcTP80tx2YlBllPFf357+ZOn8DZfvBaelDn/tb4KIdvrKHSAR722hBTspX1tW8unytSGPYqqwv8QaUzq3kgwr4ICWExuqx1HCmkiCwCwfaAlsGafPEbSoUJjB7b7/WiWn9fIwbPYKa2Mxfv6Bj+4WA+rdueDtE2LBjUGPb/TjCCwPwo12j1ib3kvPgCMLcOnLgq7NIfl6m6MArCeJT+Hy6WbL0BqwsshkBwg69Qcz6Jh4b7fJOUNhsxen8c3P0OL1QOXaBfRymt6djwrahsx1gkododkH2jd1ZGjMZhm5jqq/TSOSOy+3u7mg6gAzJVQAmzpqrIiTdYK2osSAmyCb34rBLEWaIqL+O+8h5tAujrTk8owJEHAAaWvY98Ndb24vyOh4wjnuS9NK8DoKyZX309SkepQ9Umyo8UAOGjA9L+/22os7bWfzLfLUtT23N6l5CMdx8Z8QVs1Ihya+4t21a0FBBm4cgwMeC5JyN4JS/LsDOX4A/S2LFfaDDoMWjb0pSf12pD/Zm/9iohVknB0/xRz7l+tzvSPTxGwf2aGO2EpgwxY51YlX9fAqIxWLsW/MuB+4iOiBPxmA46q0P0eIheqOfONmvU1nKx1M ktV2XYW4 fAF5kGjOoAKc0LJOCOIk21lAdfdzcAhILoc7V//oPIpCx9J27FNX1uaeM06i9y3dLT7nlZ3fxkm9Qa7JJshVgFjGT3jE+6BXO04bVT95+sjU1VBAPXxouaEQf41B5ujWbjwtOVZDwbZNiBjhkGI+7BNQgAfw8ro+Q3jMu06SQy54wOOFkWFdq2hFblQtKTDMOcV0MMwWtxXfC4o0f7EE7BWW2z2G5VN4AlhEdHN4f8SHHaXJz2YWFKL7aK81zdshYS/Euk1bZWEts3K/J/Vs0Gk0VeVeSzizbDhM9oiR6zr/Xpl/fKGH7VVyMp6HuqUrUdaXPrDyC5O3NZ2D4aU7K/pNc0QgiPlKL7h0Uw083ueAqwqawRkgsGOHLoDGl2gQ37HGq+GRUYJHeVts= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu New hardware extensions implement support for shadow stack memory, such as x86 Control-flow Enforcement Technology (CET). Add a new VM flag to identify these areas, for example, to be used to properly indicate shadow stack PTEs to the hardware. Shadow stack VMA creation will be tightly controlled and limited to anonymous memory to make the implementation simpler and since that is all that is required. The solution will rely on pte_mkwrite() to create the shadow stack PTEs, so it will not be required for vm_get_page_prot() to learn how to create shadow stack memory. For this reason document that VM_SHADOW_STACK should not be mixed with VM_SHARED. Co-developed-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Acked-by: David Hildenbrand --- v7: - Use lightly edited commit log verbiage from (David Hildenbrand) - Add explanation for VM_SHARED limitation (David Hildenbrand) v6: - Add comment about VM_SHADOW_STACK not being allowed with VM_SHARED (David Hildenbrand) v3: - Drop arch specific change in arch_vma_name(). The memory can show as anonymous (Kirill) - Change CONFIG_ARCH_HAS_SHADOW_STACK to CONFIG_X86_USER_SHADOW_STACK in show_smap_vma_flags() (Boris) --- Documentation/filesystems/proc.rst | 1 + fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 3 files changed, 12 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 9d5fd9424e8b..8b314df7ccdf 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -564,6 +564,7 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking + ss shadow stack page == ======================================= Note that there is no guarantee that every flag and associated mnemonic will diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 6a96e1713fd5..324b092c2ac9 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_X86_USER_SHADOW_STACK + [ilog2(VM_SHADOW_STACK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index a1b31caae013..097544afb1aa 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -326,11 +326,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -346,6 +348,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_USER_SHADOW_STACK +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */ +#else +# define VM_SHADOW_STACK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) From patchwork Sun Mar 19 00:15:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FB03C76196 for ; Sun, 19 Mar 2023 00:16:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11004280011; Sat, 18 Mar 2023 20:16:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 070D4280001; Sat, 18 Mar 2023 20:16:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D921A280011; Sat, 18 Mar 2023 20:16:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C1F61280001 for ; Sat, 18 Mar 2023 20:16:29 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8D9A4160C75 for ; Sun, 19 Mar 2023 00:16:29 +0000 (UTC) X-FDA: 80583731298.23.2C75200 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id 827611C000F for ; Sun, 19 Mar 2023 00:16:27 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=f+VSTnDm; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184987; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=LspeC6VcDZY1g8y8TPAj26UpA/NokIsNWt2kToN9yyA=; b=c7rat+uX1ud7pUampGPTBvVNKE61SIokqOo241KwYicdUvo63mL9JxG08WAWzHEIr7hXyT icXVkBhQWSkYtIx0k/s16ghY4KCMGPfrwIYNh5l8Zc+vXckO4I0PhL5eEmYSjsNr8w3r0S RL3mtRLz4bwnNT9X6QLzwMp01hp1K90= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=f+VSTnDm; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184987; a=rsa-sha256; cv=none; b=iQ021Rgco5u5ZESA4N0f2/Ij4vwcRpelhXwHfbFLNtCYHzeJyHQkSbASOk2APYAeaKgVWS rLKSuMQleT8dgw34chBW5I9GE+uF54MdGh4+xOxDpNDG19gRA3RpAJ18gRWRbOtHt+4OKn GrMBz72wgEffwkC/WsAqkryS65L/RJo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184987; x=1710720987; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=dronpxQi47Q7lStj0Ifv5In98YSzpCoT6jYTioIlyo0=; b=f+VSTnDmBhLo7PpyRoGk3WAb2bn+vHQw8ktfzPXOYCRyPJsVU3ygiUYl GM5mZXIBx70++VOo8Y+YpZPllQVWtUYYz32HW3oYBMT3zV+eKglfLqQ1m tT6PEHMMdB2vmZzXP+vfyPsijzgxXg+kDPvQUiTeu4ih1iUnblItTkJgY uUMfknWk6XXK2v+DvrKtJ4V8574sjycQCbpG2RWgIfIUs5Fw1wmc4XhvP BMaX1DI/val7yFNwCNrcg7+8oPCNyTJDCmsgJOTvvlp194Wlv76XWWdDY 3f/T0zYm/VATByeGa1Hkg9Al5y7zwd7CN140QJeWv3QLvzC0boMPMVpRp Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491151" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491151" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672858" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672858" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:25 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 19/40] x86/mm: Check shadow stack page fault errors Date: Sat, 18 Mar 2023 17:15:14 -0700 Message-Id: <20230319001535.23210-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 827611C000F X-Rspam-User: X-Stat-Signature: bi1w1engend3qkgntq3cibaut7br4zjb X-HE-Tag: 1679184987-207557 X-HE-Meta: U2FsdGVkX1/0WAbe0wchSNe7V+ed3kcCPJkJUeF7yOfiwoEpuaao9RBwS00Qb0u+9p32PBQD4hoJ4BQpeLmpt7KRakbrd86uw6yuF6E8cw8h8OUFlHTH7s6kK+zSoU/WN1GBXH97NScEj6LWtoyJWrMk09j1/IMBKu7d/Ahq9u08K194CyYnnM6zTH7/Cmq/4SeS5SCr9o87k7HczvM2MFS3Ab+p2VZPcivvZOhixWe/Isg086yKwVKOluCtwppo/v4AciRDz6KMViBfi5fIi0uIDcM1jgaeTHIA0NEriQtO8E9qFcjrRD61aitiff7bL1RTY5EH9SEwDDvMzSPQWAqxn4k5leN0OqFXe9GAlV2n8Z0nJksev5EHMr3rO3elv4CRSll+y4NjGHhTulnTWMpCATvbT/ZUZLwRa4k+ZAO1BbAXTGtPnvBaOIl/beYBbh60nj7LxDErwQIMYc6t/ir7VPYiibaYqcjdI5EeNfI2G2iqD3qg4dJl7kr/MEKBA7ji0HJr1KWFuXrY3zqnwbPaZc2hHoay0vOeiOlzo+8JjgG1oEZuTSSPcUE9ruvatshoY3Lczy8eomc8A4BfEhDMY8ARUsZVG48UTTyES60MzhGe1gCAwtjf2cuIwGoIr3K+B+MomCohZ8UoIUFn+ubZw1p5w2JsUjDLPPlDEvQBPz+eU8nyz4XA7FHwyUR2fgggxBt2KURoMgC7PNos+Imlth5dhG9MuTX7xDcQW/IT+m6dlyoVY6vh+LSPxx+wCQNpHkthFC/p17jxBnh02ONd8uI1Ir3YltGFtoBtiZwX1aaQTJstvp/++gF8sTUFGeZiXaXfdlUbgIelTl7E/yz3xDDFQDvqErvvThtX73jTEmQ9Dl4bBZtSgE3OFKgO/PSd+tzWGTGyy9Ci9HwP73Oq6I76qCHtd9OZoof5Wdj57MlaRnmz/7tZwqkjztr69M0ipoRlpwcQnq9dk0+ GUq9VfKA dZTJOJQi/16kpwp8gxnExNb9yb1PaTAMu8eEGYx7XHhpmTX4b2AKEq4I9ECQgLhxQ/sZ1dTu21ETosLAC1JonLQNkmJpx2E1vosur7r9oFfwOFMKv81z3edyQ7H94sbYpgKMv02+TW0thW0D1AoYvVFfwXqmei0x9fQcd2yNSjXMFb2xhh77KfLVRIzWxPFiS3D1+eZsUJFbDLq0ZW6gyLV4XXVL++9dEFU1bpFvMK/7j9cmEfR/xgQeATQ8Rv9RuFjSFjrhhEK0VZbJyNZT1dKuCa353BBMjQX3S64sDdOd539YkbFzDe+1rqfEGe+FrYZgdSQkHs9AZdFOdif8r3MHt/UsWb+sCgL8N X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The CPU performs "shadow stack accesses" when it expects to encounter shadow stack mappings. These accesses can be implicit (via CALL/RET instructions) or explicit (instructions like WRSS). Shadow stack accesses to shadow-stack mappings can result in faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. The kernel needs to use faults to implement those features. The architecture has concepts of both shadow stack reads and shadow stack writes. Any shadow stack access to non-shadow stack memory will generate a fault with the shadow stack error code bit set. This means that, unlike normal write protection, the fault handler needs to create a type of memory that can be written to (with instructions that generate shadow stack writes), even to fulfill a read access. So in the case of COW memory, the COW needs to take place even with a shadow stack read. Otherwise the page will be left (shadow stack) writable in userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE for shadow stack accesses, even if the access was a shadow stack read. For the purpose of making this clearer, consider the following example. If a process has a shadow stack, and forks, the shadow stack PTEs will become read-only due to COW. If the CPU in one process performs a shadow stack read access to the shadow stack, for example executing a RET and causing the CPU to read the shadow stack copy of the return address, then in order for the fault to be resolved the PTE will need to be set with shadow stack permissions. But then the memory would be changeable from userspace (from CALL, RET, WRSS, etc). So this scenario needs to trigger COW, otherwise the shared page would be changeable from both processes. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. Also, generate the errors for invalid shadow stack accesses. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Further tweak commit log (dhansen, Boris) v7: - Update comment in fault handler (David Hildenbrand) v6: - Update comment due to rename of Cow bit to SavedDirty v5: - Add description of COW example (Boris) - Replace "permissioned" (Boris) - Remove capitalization of shadow stack (Boris) --- arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index a498ae1fbe66..7beb0ba6b2ec 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1117,8 +1117,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Shadow stack accesses (PF_SHSTK=1) are only permitted to + * shadow stack VMAs. All other accesses result in an error. + */ + if (error_code & X86_PF_SHSTK) { + if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK))) + return 1; + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ + if (unlikely(vma->vm_flags & VM_SHADOW_STACK)) + return 1; if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; return 0; @@ -1310,6 +1324,14 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * Read-only permissions can not be expressed in shadow stack PTEs. + * Treat all shadow stack accesses as WRITE faults. This ensures + * that the MM will prepare everything (e.g., break COW) such that + * maybe_mkwrite() can create a proper shadow stack PTE. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR) From patchwork Sun Mar 19 00:15:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180148 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCA00C7618A for ; Sun, 19 Mar 2023 00:16:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6187B280012; Sat, 18 Mar 2023 20:16:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 579A2280001; Sat, 18 Mar 2023 20:16:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A838280012; Sat, 18 Mar 2023 20:16:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1E0BF280001 for ; Sat, 18 Mar 2023 20:16:31 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id F1443C0CF8 for ; Sun, 19 Mar 2023 00:16:30 +0000 (UTC) X-FDA: 80583731340.10.14CFDE4 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id 097164000E for ; Sun, 19 Mar 2023 00:16:28 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=FeaTa4Cw; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184989; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=ebrXWkOUB+pBTQx21r8Cu08K8bkARURBsW0wQIMrvgs=; b=BWjlA4SCchV5QVdRYtkWq5MzAaY9C5mP2bg4IA/tJX9V+mbJVS19OUWanzUHxWo/fv4T0y hdfiIObQLRLw9jqlFCgtyS6XKmrZl0MGwdCYh2QITn5XQW5Hn+fHTZkdFaVi097MgFpBVV J8CrboCawfB+F/81Ng79Ro2MfCzRS14= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=FeaTa4Cw; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184989; a=rsa-sha256; cv=none; b=IayZ0iw1y/Mu5WzoHx5ALN9Dhq52SRmREKJDsKDDgJmmt7Er01NJLd1EOZEWw59ECyA2qG d9c3MeUKz6/ZFE05ZrnHlkvDquWRorbVdyvw7fvLVOrLIDXzdM9lvZxce3yNdP0wEdMv// 1u9pFsh3qLoYIf77wRd/tAL/IfEdQx8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184989; x=1710720989; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=HJSW741k8aVu5mpNpwjQ9huZphqr/UOu519JqYv/Vlg=; b=FeaTa4CwF2GWFQPoucyBIMaXpiw1Dr1OKYy93n4cYi00yNgtJ2p29lsC om7Z5907+jY9QF5xg3pVkYZwxZPn9Y+Itja7DbmbgZQ6t0tIUF67DFtH6 ImzxVALRue4+Mlbaq0XvQ5ENnXzSRDP91pfnKAvzakF1biNNvjGMADQyB 9hcBYVpZh/XxkhATHOsqmbJ8rqr40l2AfxdVFb79v2eAc+QsfaOvCiaGr 5I7BWGaDijYzBAtC+fj0GjvfQk+8PZKBc6ZXDlkF4IkHoRWGUYKgPB4bx YTLjjJ9qfVDHYoNjvKVhAz0ixULTnmieBmh2qo/gTfWQF4e5ut+3CX4Kf w==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491175" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491175" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672862" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672862" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:26 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 20/40] x86/mm: Teach pte_mkwrite() about stack memory Date: Sat, 18 Mar 2023 17:15:15 -0700 Message-Id: <20230319001535.23210-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 097164000E X-Rspam-User: X-Stat-Signature: e3nrfkusar3dgmmm6zudx8o8w6uuog8b X-HE-Tag: 1679184988-981315 X-HE-Meta: U2FsdGVkX1/hvlDyKgoFjLBMpAERE094WjqQpFAUYi2Z1tACUjvfaWLekALw3j0NVBXu+8FU6FASuR2oZh1SHGmxFh8uF9LtcfOrbiZJDx7fpPJmnhJwdPC09PlhSHn/tdb0IB3tif01hkY7ER+J8AQhmyq7EKLR2DfQ7AGjVgtuEQqoJWXBJbOvv48KJSpQg/rM/hS/ukbU3QZiey6AAK2iEMYhn8ZmicphLSB4jHBr9ENaMnNvza2UfCCakFcVTPCyrsDhTUehoVUSuWm7aEgdCaOdVKNyz80F7vWb/4GSXICAOe70PzuuxAWK/0Tis3yC1PseQpO7rbhi/oRIPHS8GU5O3qm9aarRm/LYzwGxvYOXz2Rh3p6g6/LsqWzEhV8MKZIoz3KYl1dUO5qfo1uFJXFo5aPxH76x2uBby69PWCnWb6N0DyC9QmdF4yB1IkiANbDCFyvF5wr8D2XAGV6zliD1WpIJnlCd1sPRq9A/JUfCcAF3tqD1ojvCdjhgHftl9tmFXrHuIHZvq2EHAidYz3BeBLTqsDSs5vFZWFoSPEm8eRLDz0FuIwjNG8CTlGpGFXcjVH1fBWFPcJYIErYLdkjaT/tij9vrpmebAnupVCLlkWGNpTGoKrC3OIowdMtSfUb77+eEeXOQhaBrpE3YqH3TqMK5dcxxAu6MfCByUdeqqOMLjFSBNDe78KrWcGAYh8XprIf/pe6i9dLalCG6qhT3WgsGitWf39GonKSN7/ZtjX9csF/8cXOKOoDNu9rWlMlQ9dElePV9FhcqRgVpYWTiVfYCEKs0DBqVAxr9xvxZBC/2z9D8TTKyQi0OSFHL3ruNUot6GjGWEV9125V3mw1lB40vVZqUBfGiQR9iXd8duwCLwWMbw4eINSkr3Fz9x22YVcKXE7IYbZwH8ZXB5hGaZszgfgUMqEGkP9waMAKsRwZ0i4s752XXosepSHnKvVQyzgqrFYvWPhi HSA03kC5 ZhH42M+odrC8VZjZSgL6UuUS92MMccqk5opu7hqszk2OX/6V4g1DsMsk+zA1BKw36Ti7K9xj6kRx0Tb1BfEVtbS5y/bIfWZo1Xah1zBgBnJgZQjNaiiVsA80xb8gl5tuqjPf/wkC9D+fw6rQb6P7Cp0nzbRAE31gz1H86uccizYth5oI6P6CGSbWHy2Q3z5khJBlkqHG00/nkJcMEym3OmtI5HmGOwYJUaPBOVe05Ex+o3R7JIeP3vmtICzvEcBcqPb9yG7L+OpNsIWLnd4MX8iRP1luNlKw1s5g95odqacVRRGW5hzjGiCfTCQhXMEbRZqDRKg8VMKYVjTpxtl/0LvpMKWmsa/iInR6RGA+aK5ppQ/Bzr1amNoXmbSF/10v33KxnZEFHqGQoPSu+fRNGUXHnO1pcssTWYau/sPG2DYGZwc0101fokMdwKEHSK6C64CZivat4PTAnnd0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If a VMA has the VM_SHADOW_STACK flag, it is shadow stack memory. So when it is made writable with pte_mkwrite(), it should create shadow stack memory, not conventionally writable memory. Now that all the places where shadow stack memory might be created pass a VMA into pte_mkwrite(), it can know when it should do this. So make pte_mkwrite() create shadow stack memory when the VMA has the VM_SHADOW_STACK flag. Do the same thing for pmd_mkwrite(). This requires referencing VM_SHADOW_STACK in these functions, which are currently defined in pgtable.h, however mm.h (where VM_SHADOW_STACK is located) can't be pulled in without causing problems for files that reference pgtable.h. So also move pte/pmd_mkwrite() into pgtable.c, where they can safely reference VM_SHADOW_STACK. Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: Deepak Gupta Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) v6: - New patch --- arch/x86/include/asm/pgtable.h | 20 ++------------------ arch/x86/mm/pgtable.c | 26 ++++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 05dfdbdf96b4..d81e7ec27507 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -456,15 +456,7 @@ static inline pte_t pte_mkwrite_kernel(pte_t pte) struct vm_area_struct; -static inline pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) -{ - pte = pte_mkwrite_kernel(pte); - - if (pte_dirty(pte)) - pte = pte_clear_saveddirty(pte); - - return pte; -} +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma); static inline pte_t pte_mkhuge(pte_t pte) { @@ -601,15 +593,7 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pmd_set_flags(pmd, _PAGE_ACCESSED); } -static inline pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) -{ - pmd = pmd_set_flags(pmd, _PAGE_RW); - - if (pmd_dirty(pmd)) - pmd = pmd_clear_saveddirty(pmd); - - return pmd; -} +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); static inline pud_t pud_set_flags(pud_t pud, pudval_t set) { diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index e4f499eb0f29..98856bcc8102 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -880,3 +880,29 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_SHADOW_STACK) + return pte_mkwrite_shstk(pte); + + pte = pte_mkwrite_kernel(pte); + + if (pte_dirty(pte)) + pte = pte_clear_saveddirty(pte); + + return pte; +} + +pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_SHADOW_STACK) + return pmd_mkwrite_shstk(pmd); + + pmd = pmd_set_flags(pmd, _PAGE_RW); + + if (pmd_dirty(pmd)) + pmd = pmd_clear_saveddirty(pmd); + + return pmd; +} From patchwork Sun Mar 19 00:15:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0847EC74A5B for ; Sun, 19 Mar 2023 00:16:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35B19280013; Sat, 18 Mar 2023 20:16:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E22C280001; Sat, 18 Mar 2023 20:16:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10EC1280013; Sat, 18 Mar 2023 20:16:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E0EED280001 for ; Sat, 18 Mar 2023 20:16:32 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BB41240C68 for ; Sun, 19 Mar 2023 00:16:32 +0000 (UTC) X-FDA: 80583731424.20.653629B Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id B222A1C0012 for ; Sun, 19 Mar 2023 00:16:30 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Btty7E4f; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184991; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=7b/QOAWeKJUMsVXOohTWsZBhj9LBeQOJng5A/JiJyYY=; b=kuEAVdXvp9pQcqfpQE1kI79UxDQ41CDJiIECL4+OabmJ+09ZPgE/IJDAREKFs2czb0qV6n SqumHITVTG0gRRxbSzZNUnzHWsUCMzuPej1EVg+ST6qveeytJw5GhppZ4+sEX4NDTH/B47 e8lZLDjakqMVbHD+71XqttQAZkXfEZI= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Btty7E4f; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184991; a=rsa-sha256; cv=none; b=gLwnGyZ1lf4DSc+SJT0ycTeD1AU+PJD/vvo6oLbbGYS/5My/TlV+CMWUsHfGv0mqfsk4WQ jFNfUXmXAklGS0XyoVGcXxBsX1NQvJnKj5BM9DhaW81tMX9o1uQtFEXcFBLPyLrf1TEN3+ i6JaYi32id9gaF/KVw6nHpn5JkXaQoA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184990; x=1710720990; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=u/rJkxTnxXm+UTYwj285Z8LlwIJixTQ+7DbH6nVmNI4=; b=Btty7E4fD3hgE8h3/TQQxL5KiwgSP2aFmzIY+7xTpu6/JCGdRX26tKVE AqtdeAxdfdkmKu2puUMPT5bG/qGHqvECfIe6TKKUHatWDhh826oscTbvT cRO522LIZlblk3XbwggUR5wKG2NPSJ3Ng5W54ZfH1fFfqll4XaJTSReyt 3ubA4AgN6p3NlCmwUW1d189cataTwFETs54H91Pq6iYJbsmVHmQsPXfS+ bN+KWERykJ+jU5Ys2uiXdcmcatPrE+LMf+HGYiJqblYsw/m7JVGvDjw27 V+M3nyAt3+29NJDWBl+zETsIq0RVhNPdVuWTDENMy+3u0WxcoT2lWQrkO Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491198" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491198" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672865" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672865" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:28 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 21/40] mm: Add guard pages around a shadow stack. Date: Sat, 18 Mar 2023 17:15:16 -0700 Message-Id: <20230319001535.23210-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B222A1C0012 X-Rspam-User: X-Stat-Signature: 1rpbiykm8o5u9i8rh9wi3z741wzff79z X-HE-Tag: 1679184990-187008 X-HE-Meta: U2FsdGVkX1+xkD2gN2XW3Tyh8wMqvmnBCNUdtK14l68Yy7J/iCzju0/KX2+wo9keHZmcNaEZX2QkjsIYZ9sFOMBTE3aGHCJOI8AEKoRqvfsbJYE1UgNK4v4zSZEHUbv5zlrUUulzv0FNje7qRBZxKvi+4M+PmzqSe9afHKiMJdOOaB+6ZuXUW8wH5t79KKEfw8Zkm4DUNS6kQsbIqfQ34P8xUhD4eLAQGNIObPmmKcHBl2p6zw22LiNtwqlmQhu49v+r97zYl4hG4YpgOkrepmtTZJn8oNV7SlqKgG1DwBXSpGonOGSDlLb0vsgWQ9hqWVMFOxWhFuPbXqRmK8+1ZW0yJ/LCe4UTI4JXkzpRHQEG7QLYb/L+YmYVmWvxdOxiNlJdlDf7dQDVuBvXlYWNWY+cKrI5xleVdwzUQmAGuzz5udPH2hUB5T+R8x8zYen1a9maLpuLyTNvQLRVu0oHlyxbrDDN3amJL3pG0PsyYxTbh9keBoP8544pIBEEFEWQjbqWCOR+hWPvGzlw46ooQ6DN5g2TM9sAk0gE7sLx2dMO9DheBgVx4TBElBJ1xhqvhOE9H1g9+usQ7lH8BJcbcTgRytKRJvZBJuAETxDe3aJg5zl+CLiCY7tCiXUDGbBneDTJWaUIdCKZi9BN0PelOUeSD/JoW/OrLitGJxXqFwDKgUJGn+b3yjeMrvZ1B8DPs7/ky8p9K8+zpnYjQ2PlLPaN5Ny+Hb5xt7fhfI910GwlYfuN/nxlYtHKGkDHAv5QTCY3dDmlT1cQr0wG0+TOcJ8jPxP3D6pXGTlX5wdQNPXAwIPysDrVXSQ8mEiQNph/bLgyxEjXFrA66+y8I+5UfyY55SlHxZI4xfqQ21GBmc7culrY8t5AMmAWA7OKIbg9n8wCZMAVv0V6k9dle8iRqRNvtxVGq+CEpkn+KNlE8CFGL8WcZCw5Vnr0QoDixKH0F9rrApI3rpYs9EGQZNm 234/L63X H1gNTDG8cQ21DJ1cKP1ZHQa51ySVaTlcMLnAHDzQikhVPCxKQ8z1Hqdk8GMHXxZnYbvXU/pUthzqQfJee6QEremsgzKblYXpk3+JYctqwNNWm3xuILybOg9v8gD8rTzD4C9vgfsLYJrG2BJn6vLNK2j77/fc2jswXdeEqri9Ybl3MOP2nKqWH6RKAJDgnl7I7vs3vnjTs/HBcrHf2QZUZLzAmDRGuQMJYKT7xh+Y62eAehfOzatJ+9IQ4eppLMFPMJCv0a6Lie9jkXzqgPaoVkmkUkhcomKmFHTDIl4Z0R0s/UAfX0GfMtuDqQ7ozV/NqzZw4uEaibO062mA7sb9VKik+6ZSkwsJL5Zou X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. The architecture of shadow stack constrains the ability of userspace to move the shadow stack pointer (SSP) in order to prevent corrupting or switching to other shadow stacks. The RSTORSSP instruction can move the SSP to different shadow stacks, but it requires a specially placed token in order to do this. However, the architecture does not prevent incrementing the stack pointer to wander onto an adjacent shadow stack. To prevent this in software, enforce guard pages at the beginning of shadow stack VMAs, such that there will always be a gap between adjacent shadow stacks. Make the gap big enough so that no userspace SSP changing operations (besides RSTORSSP), can move the SSP from one stack to the next. The SSP can be incremented or decremented by CALL, RET and INCSSP. CALL and RET can move the SSP by a maximum of 8 bytes, at which point the shadow stack would be accessed. The INCSSP instruction can also increment the shadow stack pointer. It is the shadow stack analog of an instruction like: addq $0x80, %rsp However, there is one important difference between an ADD on %rsp and INCSSP. In addition to modifying SSP, INCSSP also reads from the memory of the first and last elements that were "popped". It can be thought of as acting like this: READ_ONCE(ssp); // read+discard top element on stack ssp += nr_to_pop * 8; // move the shadow stack READ_ONCE(ssp-8); // read+discard last popped stack element The maximum distance INCSSP can move the SSP is 2040 bytes, before it would read the memory. Therefore, a single page gap will be enough to prevent any operation from shifting the SSP to an adjacent stack, since it would have to land in the gap at least once, causing a fault. This could be accomplished by using VM_GROWSDOWN, but this has a downside. The behavior would allow shadow stacks to grow, which is unneeded and adds a strange difference to how most regular stacks work. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) - Move and update comment (Boris, David Hildenbrand) v5: - Fix typo in commit log v4: - Drop references to 32 bit instructions - Switch to generic code to drop __weak (Peterz) v2: - Use __weak instead of #ifdef (Dave Hansen) - Only have start gap on shadow stack (Andy Luto) - Create stack_guard_start_gap() to not duplicate code in an arch version of vm_start_gap() (Dave Hansen) - Improve commit log partly with verbiage from (Dave Hansen) --- include/linux/mm.h | 52 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 46 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 097544afb1aa..d09fbe9f43f8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -349,7 +349,36 @@ extern unsigned int kobjsize(const void *objp); #endif /* CONFIG_ARCH_HAS_PKEYS */ #ifdef CONFIG_X86_USER_SHADOW_STACK -# define VM_SHADOW_STACK VM_HIGH_ARCH_5 /* Should not be set with VM_SHARED */ +/* + * This flag should not be set with VM_SHARED because of lack of support + * core mm. It will also get a guard page. This helps userspace protect + * itself from attacks. The reasoning is as follows: + * + * The shadow stack pointer(SSP) is moved by CALL, RET, and INCSSPQ. The + * INCSSP instruction can increment the shadow stack pointer. It is the + * shadow stack analog of an instruction like: + * + * addq $0x80, %rsp + * + * However, there is one important difference between an ADD on %rsp + * and INCSSP. In addition to modifying SSP, INCSSP also reads from the + * memory of the first and last elements that were "popped". It can be + * thought of as acting like this: + * + * READ_ONCE(ssp); // read+discard top element on stack + * ssp += nr_to_pop * 8; // move the shadow stack + * READ_ONCE(ssp-8); // read+discard last popped stack element + * + * The maximum distance INCSSP can move the SSP is 2040 bytes, before + * it would read the memory. Therefore a single page gap will be enough + * to prevent any operation from shifting the SSP to an adjacent stack, + * since it would have to land in the gap at least once, causing a + * fault. + * + * Prevent using INCSSP to move the SSP between shadow stacks by + * having a PAGE_SIZE guard gap. + */ +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 #else # define VM_SHADOW_STACK VM_NONE #endif @@ -3107,15 +3136,26 @@ struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) return mtree_load(&mm->mm_mt, addr); } +static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_GROWSDOWN) + return stack_guard_gap; + + /* See reasoning around the VM_SHADOW_STACK definition */ + if (vma->vm_flags & VM_SHADOW_STACK) + return PAGE_SIZE; + + return 0; +} + static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { + unsigned long gap = stack_guard_start_gap(vma); unsigned long vm_start = vma->vm_start; - if (vma->vm_flags & VM_GROWSDOWN) { - vm_start -= stack_guard_gap; - if (vm_start > vma->vm_start) - vm_start = 0; - } + vm_start -= gap; + if (vm_start > vma->vm_start) + vm_start = 0; return vm_start; } From patchwork Sun Mar 19 00:15:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180150 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D967C7618A for ; Sun, 19 Mar 2023 00:16:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C593A280014; Sat, 18 Mar 2023 20:16:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE4F0280001; Sat, 18 Mar 2023 20:16:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0EBE280014; Sat, 18 Mar 2023 20:16:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8BC48280001 for ; Sat, 18 Mar 2023 20:16:34 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 66684120C46 for ; Sun, 19 Mar 2023 00:16:34 +0000 (UTC) X-FDA: 80583731508.13.189FA0B Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id 5164A40007 for ; Sun, 19 Mar 2023 00:16:32 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YMeMjMog; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184992; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=85mBto0Sll7IRfEByhCC7vuZskZGVCPWl8XWWuDZvdA=; b=IrzysuwqA6K1U1u9qJ9hGVpUp/QIq40C+4sCuLMoFUbxkoElVMQusSmQkv/VfR/Bmpm0zO s4i87pFOFiQr9hnmYx8PAW6nKJsBFeulrkae+4WrphVK9JQxX8lE5IbI/pCqmqKLRAl1G7 OkFPpaTNRU4qpkkWtOg+hXfoqn2ZPvs= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YMeMjMog; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184992; a=rsa-sha256; cv=none; b=Ry4yRlOOhpJotiGTv7jwlyzh2/UaEn4pRXYfqIWhVXo+D8+x3aKG05rh9KJQgkrJHEU7+H VUDki2v1oLqFJie29MmiqrZZVLV4Rxqoxw+/ILBGG00LgQlewdhlS0J/zMTsbUTCmw2x6p 3HDFbFkVLxdwBktg+bTcPFEJti2s4WQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184992; x=1710720992; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Y8Z54H+73n2LbfSOdI84l/ZPS4qK7Tl7I0RpDbdZ/iA=; b=YMeMjMogXdCSFJzDxbpEZvcwjDIqz43r0l641NQ0r7Cm5s+sDWBFaQSK naiFRA4LfV5nZiimXwquLW5CaH/5WwgM9YjHpiNld6gIVCrV7Y4EU7Nif nv5Du4/ohJrxMPceNwDvsI5JioitkDcALrowe1zSVpi2HOzE9uGQBVVZr w3/UqNPr/EhnPryURFue8pQrG7YStJg+A3xKR8J38wrFMMxzzKty3Ln/q AsgmqCI7Rnd/7Ni33ZgB1zWVOiltUVRcLLcWzR33yikkbqw41hWvTYWx5 6Pq9mH35zZlPGcad0sWCIbpVaEP93pReVdNJyT8m6Haeh7vDmXlCSPp34 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491221" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491221" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672869" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672869" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:30 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 22/40] mm/mmap: Add shadow stack pages to memory accounting Date: Sat, 18 Mar 2023 17:15:17 -0700 Message-Id: <20230319001535.23210-23-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5164A40007 X-Rspam-User: X-Stat-Signature: kzj8zwmsgfdf6wtrabthxw33wdwpzjjf X-HE-Tag: 1679184992-925483 X-HE-Meta: U2FsdGVkX19CZ39Qg/aQEpG1qm1GStoroNKKSzBhYHjhf/uVEIXTzFzcTCB2rO0SsLOjWqiU0M5FeGkQP1Uu+xtTzY1hiHczD7E1E4Auv245cLSDxJsukJWWpDm3XVlWeL6QoGs+NOrTP48rGt+p779B3v0dTHJdtJ78G3ycWpKkDR/9mVCIzelxN6rZK1O71pIRJhDQ7z1RinUvK/urY9Dhf8BybxHgKrdv4gCHOnFp9W6wFluMHduiXqIkatTj7guV+WAsDCK283whh8uo9SFI7tpwL1ZVthFmQpa53ebH5JjH1wsNnZkyiZ4ca92JGP/FnidsognQZHv56mUrEZFSYhzneEs8IcmqHdhe2fBi7zGwsX9pAfbY8+GJASmlJCZNDyy4d/0Rl8naexZQsKJqxC87DZamdgwNEuTzqFzYxczeomlai/Jqd+NMCSzjSXEX6y/6lvEotuvs90Ob2l7LUu4oMJoSwLIUkQaW0rtVXfm8sEW759yBs5wce/eVmP3MN1nOiD+tjmtYWUgdV5qQopWAyp6cMPntZzOIOAQ+YwmeoQNqBuslxMhTpFwFguha4j3OlBclYFwF8Aro5ifVML3OUdNKKE5w2nU6hnZAZYRtYFM9lRO22I5UHZzW+JLaaVN8ZFHxk8u9gAksU7CTL/O/N5k2uZC7537EbRjnrpu/rWvi4hzo6/LgPJmKOBidc6FCcy9CDl24YWs6w3xN7Fd+Q3eynElAARZxu0ejGFzfq1G81LBqRE+u3ZWo9xihfbA1S+2l/z1PqRWjfcGyhNhOEKkmdoW0KhDUXUTpw4hDdh5RU7ZHvRkY5TL4QuyZBijShWLGoGeNbRnGO7BlfiffYqRejDA1hs84XX3PqGn2eRFfszcOplTsyu/BOpUtZF+D20qCDOl6ap5cIZpksBHE0PZzx8soiveoygW+NQTQopuvsvMGC72DWEv1E9Cv3wM/Geb9Nd83XvJ dKTF8GM0 9lP3Cji+c4AWKabOuW5on5u4j/d3BLMup3MJYANn/vcGg33iTug5kwY+4qB3OXUcpjJVhEew7P76xJ9L+JToOn6cOvLrQpGzul+qBhd/keCyqjvnEDuGbRGg+0FdOXz3qUGNpPRp3WargOsT6IDD31tbdpI4ZET3Ovqx2wuQlw2yiNQ5LhDg79+uc/n0Xj9SpC0ZYjuj04G9RFfbI994zXhoXu1g6rgqT77QbOkxYDl+oV//XbbX8mfEJwTgA/f/Bq/CmEEdE19+ZMgkzBvz4xYVk+vMXlNxhVQz3FMSQoIxDU2I4TXaT65cRHYSpmMuo+d8Fx1OUwu5rsr0RqLGlZogxQ2+mL5IHHm+N9EEM1oDdTNbQ3XoC2kVOyg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbaige (Boris) - Update comment around is_stack_mapping() (David Hildenbrand) v7: - Change is_stack_mapping() to know about VM_SHADOW_STACK so the additions in vm_stat_account() can be dropped. (David Hildenbrand) v3: - Remove unneeded VM_SHADOW_STACK check in accountable_mapping() (Kirill) v2: - Remove is_shadow_stack_mapping() and just change it to directly bitwise and VM_SHADOW_STACK. --- mm/internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 7920a8b7982e..2e9f313fcf67 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -491,14 +491,14 @@ static inline bool is_exec_mapping(vm_flags_t flags) } /* - * Stack area - automatically grows in one direction + * Stack area (including shadow stacks) * * VM_GROWSUP / VM_GROWSDOWN VMAs are always private anonymous: * do_mmap() forbids all other combinations. */ static inline bool is_stack_mapping(vm_flags_t flags) { - return (flags & VM_STACK) == VM_STACK; + return ((flags & VM_STACK) == VM_STACK) || (flags & VM_SHADOW_STACK); } /* From patchwork Sun Mar 19 00:15:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 430E7C74A5B for ; Sun, 19 Mar 2023 00:16:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8FE54280015; Sat, 18 Mar 2023 20:16:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 864A4280001; Sat, 18 Mar 2023 20:16:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6191A280015; Sat, 18 Mar 2023 20:16:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 41091280001 for ; Sat, 18 Mar 2023 20:16:36 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1E72BAB048 for ; Sun, 19 Mar 2023 00:16:36 +0000 (UTC) X-FDA: 80583731592.21.A18DB69 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id 0D3631C000F for ; Sun, 19 Mar 2023 00:16:33 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=R5684uWn; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184994; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=AOIntr3VxGUPIG0WTgprd1+JG8OST+0GjG70hTrdOgU=; b=HDusJ8RWpFTXz3qTd58oRAoXnj1Eamh1E8i+sbfIHBVKgXt9A41+tN5AK+WZRKvWIZQ03U TLIHT4FmEvcQbO5pdn9NBwF7ENPT2/Ts0J7w6IBK9sCvFQVXIq4a+0fbxmUTiwCvfHKoJo dh8J9Xz88DNmCY3E6M3iXaFIx0MCpsY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=R5684uWn; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184994; a=rsa-sha256; cv=none; b=oQt6IVGneqt1AbtQbm9/0jAAHlbBrpPDXy+j3VCjGEUJqyNluV2V+clmwN0MwvjlJEoNg/ 00reBf3GGscY8ZpnkuYeD/2T9yvPex0YVD+PsSca2jlDxkZSN9rc12IYVuAqX4QvGZ2iYh gBxNMrbhhmqBJnKkRqefbOscCbpSyAU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184994; x=1710720994; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=JiFIyLCYmEZEMe86X9SZ26iJG6/i80APvHoHhUeLipc=; b=R5684uWn/GcioQnozX0J97XpqCYWcfr3u0VdKaKF4s5LC6QXgneYwW2k L63EMlDnQiMoy6NQakrLF+r0AMDwkA2+VB32l2vsEPDWKBI7c+loX5rnz oShOlQvoST32Fkmc8oNHTFW5uaBVPstA4X7r8wM8/teaVYzSDec40EdBP Ce7JgVPGqSEDv+cG5k7YnbANw41bWwU/z+5UichLW3/rsupwIjXlCTymC 139eDvNaj+k27+Sv2xZ5mHwO6OylNEq6Yp2SmqMX/o1VzQ2d1yHuXEeOl BizlpTkMA9mdrP+wKQ1rkA6Zwyz8YGlJmbcxdO1cyDxPe+lwRnde7r7RL Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491245" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491245" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672874" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672874" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:31 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 23/40] mm: Re-introduce vm_flags to do_mmap() Date: Sat, 18 Mar 2023 17:15:18 -0700 Message-Id: <20230319001535.23210-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0D3631C000F X-Rspam-User: X-Stat-Signature: ms33irpt9jbt4bir6bmktfrgpy9pdfde X-HE-Tag: 1679184993-508672 X-HE-Meta: U2FsdGVkX1/8jVlRBCj1yDXVu+1G7OEYOm06VhU7jdy8NXkZcWu47n8oph6S0OeURE+4wA4KqfNLq90LoZkgDkZpfFNbsLdlysI/MNihf3JH/C8nPZVwznyLYBcEBN0R+92aRLNA8m6RkxnjIDNEIsh7ipqSsOCcjqWHV7QeI0lAIgHlfa9jf6y1la56BKPyGuXD2eSm9aOU76+mfloqrbJcipQzXVg3djnKSfIIz9go8wNrgU+SkW9UySNHsxrmzymoX599GlAnCJ53JQQVhij9l3OlZuWCxHGdTtFyMnZ7ATfVWvoLZXOFpiF4/mf/d+kkT4aCxRen5dxSKwgULAjq52N4uA3L3WAmK4U2SMOZSYQ8F4zq2AJlKjiljuUdJlOrOP83YCQZsqUmUWHX2zaYbCNsTNtuA2O8HeeokZDWcsnN4raqV9DYJxZGEnPtaTchFKRcQHpBKFjx5nuoYG87pkui8N8t7/Ci9E9t1gidNCxEzue3eH1x7Bx6G1K4R7RPWQRcmIDsotsT89KeJTh5KdtXynOQtI/adKClsPxSjTAyNC4SvsVWLzw07FOdD2olBxbYzURQ0HgJJprHVGqzQfXal1OKe4yDkzEI/bPYk5ostZqRiJRxpp5sGuvaQzDn9uinG6B+xRmJBOW+YbBuU9ZuWqd9HSGf6iwvPI8U4UTD956XbW8EFVTI6378aptgvhY3neGkEAmdLaP5yLVUGJ1gSX4ycsWK0HGouBa4b02lnSHauX5uWUCoCWvwWdfWrbuH5oryiYn+pMKDB1foBZq5BLCgC7BZR6B32+GFzmodQZF750UAC/2FqZw4nUUeUtVHoMD1O3zA+RjEQYzEldHbX2z3257cmc30/7t0lxfEx0uAwoWpdgys8KGV1xTHdK95+uijrEBuoTdplqWHev9YM0z/MD/4+7pcEMTS+Hdy/FjiHUhY4uZk7JyvvtqGiU+6cZCTuaoglaH +ulHJw2p ON4oH3oOIYzWNGD+WXdSQs910hh/Uah3T+79Z/26OnfYuQ2HkBsrkZ+gSNax4KPBGXEu0wLgQuO4MvhTqyO9thQDh+jFwkMS10NTEjpoHHolMeD85DvFpEKIXdN7NOaHbcFjm0rBLUTANCKLMPFjg+3tKUlQSqNky2Lg/UdFkZGkHpfHgdqwmFadwCcozmNIuZMfnfiWluihqBhwqd+o//XQHgrHYDwPpJsYWgZ9ljlPozEClhRp5rHHaB6x4KzFCnxJJasWBEAxfacpNQByk1WSnjUiswLsP8/g1V9BZQX3OCvbtv0IDiUYiMtmPiFvz//nBPSzitkobJhumzWaxx9BpCs55HM4OfVXa98kQZ0mUuuhJkftFCadNcI2i5ssiBNZ0DK0nxb0iV0IJHCd/4AHGCTXmg7nUESQVU7w7K/+vbag= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Co-developed-by: Rick Edgecombe Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Peter Collingbourne Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- fs/aio.c | 2 +- include/linux/mm.h | 3 ++- ipc/shm.c | 2 +- mm/mmap.c | 10 +++++----- mm/nommu.c | 4 ++-- mm/util.c | 2 +- 6 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index b0b17bd098bb..4a7576989719 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -558,7 +558,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + MAP_SHARED, 0, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index d09fbe9f43f8..d389198e17c2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3043,7 +3043,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, unsigned long start, size_t len, struct list_head *uf, bool downgrade); diff --git a/ipc/shm.c b/ipc/shm.c index 60e45e7045d4..576a543b7cff 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1662,7 +1662,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 740b54be3ed4..e2c8e8e611d4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1191,11 +1191,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; validate_mm(mm); @@ -1256,7 +1256,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -2829,7 +2829,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, file = get_file(vma->vm_file); ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + prot, flags, 0, pgoff, &populate, NULL); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index 57ba243c6a37..f6ddd084671f 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1002,6 +1002,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1009,7 +1010,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; VMA_ITERATOR(vmi, current->mm, 0); @@ -1029,7 +1029,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ diff --git a/mm/util.c b/mm/util.c index b8ed9dbc7fd5..a93e832f4065 100644 --- a/mm/util.c +++ b/mm/util.c @@ -539,7 +539,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, + ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); From patchwork Sun Mar 19 00:15:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180152 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B23B6C76196 for ; Sun, 19 Mar 2023 00:16:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F1C1C280016; Sat, 18 Mar 2023 20:16:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7D8B280001; Sat, 18 Mar 2023 20:16:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5943280016; Sat, 18 Mar 2023 20:16:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A66DB280001 for ; Sat, 18 Mar 2023 20:16:37 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8AB581C6408 for ; Sun, 19 Mar 2023 00:16:37 +0000 (UTC) X-FDA: 80583731634.11.9A1654D Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id 8F6AC4000E for ; Sun, 19 Mar 2023 00:16:35 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hkc5btwE; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184995; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=SYx+P7Ic7HNiZnYw3nI9Ocezil5osIwuNxYJyZqEM2o=; b=A6rL4NasthllxKUXZf21V1lbmfjDeLCyW87lIhMxPnrBgpXrooR2x7IXtdhpF9AgLVZmtA dS/vfMxM9pb1PhLZFjOc2X240L9vpSwUB5PUiYrEd/NpCPs6RDj0BLh+sqURu4EDFZ3cKq s7AgzoL+Rw/cbK/4IIW2EMDPgvBLYM4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hkc5btwE; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184995; a=rsa-sha256; cv=none; b=aHqq52ddZuRgoFJ5paTNXIUlfJjsAxcX+bYFxOIALCovLRrI+CHgH0QFRWP35qIU3C1t/m kyUb5kxLeBVyPMRn+AfG1Qer1RX9AYNbVzCzeDEzVkTUwb3rYbzjJCbITI1r06agnsTa6I w897ZrRQC06/NIviLUeD7QjL7VytX2Y= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184995; x=1710720995; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=opyjBN/VDdKGQFn13MiWP9WuORkQYDUhpnOpss/5QKc=; b=hkc5btwESL6le/ndhfzVkaWzkQYQL9Fh/wJlbOwCcfg4zS/YnfvuBW1j nYvoBMjJIukhhW6aWepzH5xx/0IfLeFCGur6/Dv3KLtzIOhLC8fbre3hU VntbOMrMLBFHi3C/aIjS91PHODh7PNeZYvzAXg4bTLbV2jmXVWrcK2yjK AGgSM/nz6ZBeW/iAavTN5wtcAnOpcKFwogL3lNNoMv8gcyT163/mYUL6A A3myPsabo1BFyTKf2RhdKE+17n2i/ReHWIY4/5vEN032exKjDO4NagTht KF6D9GLE2nRZFiXhuYGG/O8os4G4ZcFz+lrp1yljXuWQluj4rE6VmP0Zf A==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491268" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491268" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672879" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672879" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:33 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 24/40] mm: Don't allow write GUPs to shadow stack memory Date: Sat, 18 Mar 2023 17:15:19 -0700 Message-Id: <20230319001535.23210-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8F6AC4000E X-Rspam-User: X-Stat-Signature: icqshrren76px8tniyaq3gd3rjcfzcth X-HE-Tag: 1679184995-313021 X-HE-Meta: U2FsdGVkX1/MKDfSqKuiEEo5CX2nli5xq2meNbHIXfzZM02ppbFKgIxbI0ygA5JTjtHksE2aZXyWcx00UmkVAGnpvgTsqAlteI/VnGZ/IMqsudroYPWfEzcAnbvYEtpm9GowUmaUiKbbpmX8oPV04k4drIKIaccForpMYIwxufuMpPIa09UY8Z8zBJIAAe+I4xfr1tLAU8S01Nz+pR51ZOnVL9lbK7t9ZcyaTTO/VNYmNoFjcoVgYYQRcOt732hSvXZB1VPYdQsjaptKkbyMSGQJylx3Qf1PzIFdlS2+56SrWCp/JLr6aQo9Fhm4olF56hulWCd3pEXMNWOxaj4zHQb/YwBWwmYNFjZGqK2HUsVvkx1Qou4L+Ms5VRXxDOiUOugoczwQuWDiiAhaSfrTG8alRbZn0hQREt0EP4itBh0qIpPff+huFXztnu9/mRgpf2D0M/AzfFQ8fqXP4kO34WOz12/gX0gOZmxoMYltkeqhyzGu8mAa6y2BdDExMZgFn00wWYYEx592NEWT6WaW+q4a2z2dZLFagVEPAQHkHl2mLGzi6sNqVV5W5xI/cmgseRYzm/FMPpNhAIP5HHdApK8rcYkDukFWMA3ZpQ64Kh8flE1WBVTDeyN2/gTSPo2ATnyV3xRkU6TLJaBUnVmfZxVmN7E1iBWhrZ2JwzMaDSUGOhGadQsf0GrJLOgH5jdmvYCoJjXUIdfjHzEM6Wll+KAXQv2hZUo12lg6oMK8g6Symy3Ru6Rh45TXfDtD91/xk8/MQd0tPQnrITemxM3fDQG3KH7eu5jOJSnAGOmx8EfBIQBSMDiCAkM3yoIDd/JJhXcKjdTA06WPR2XVI11i7Zm0pYMkYp/ucd+sKGEgCSnhvkppk/vyfPHYs5ybOkMZuNSNJrUhitlwxt+R1SpsbIlVTskXsNxJzERKacDhlLQu6CVPH3LwBUnQxObfD0orIqktwxAZF6RUdWu8ef7 eK9gk9ar pThJBCxlff8skMl/zX7+1RpDTCI9xE88YqU1WVgw8LAis+2REvSQ9OasySNsGJtkS/9iPwqfFuK9vA7U70KqHYqUAvcnlGLmtmGZNYhXXFjz/w2/NK/7bj0i83mMsOkFy4y/9QmH6CrPC/baG9ALbkDBbKVnj+0rs4K2CrKAHAiJhQsjyAe4xe5kQg/mNBGeknAZMUrDLBDRryuZwogw1/GqULI+RzOpxhfWaInTSkPxsZKemj7bDS03vzV1yThFaIxYZOSkRRCOukuCMUZxuFO/1UeaFogFx2h4U74OZqfOd8jpNYSIgeNGHj8Vpqq8/sX7TgoTiaF34enb+TjD8U41rcreOXSmH+0a8dv4FaUdZ4m3MlNK09q3Y1g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. In userspace, shadow stack memory is writable only in very specific, controlled ways. However, since userspace can, even in the limited ways, modify shadow stack contents, the kernel treats it as writable memory. As a result, without additional work there would remain many ways for userspace to trigger the kernel to write arbitrary data to shadow stacks via get_user_pages(, FOLL_WRITE) based operations. To help userspace protect their shadow stacks, make this a little less exposed by blocking writable get_user_pages() operations for shadow stack VMAs. Still allow FOLL_FORCE to write through shadow stack protections, as it does for read-only protections. This is required for debugging use cases. Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris, AndyL) v3: - Add comment in __pte_access_permitted() (Dave) - Remove unneeded shadow stack specific check in __pte_access_permitted() (Jann) --- arch/x86/include/asm/pgtable.h | 5 +++++ mm/gup.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index d81e7ec27507..2e3d8cca1195 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1638,6 +1638,11 @@ static inline bool __pte_access_permitted(unsigned long pteval, bool write) { unsigned long need_pte_bits = _PAGE_PRESENT|_PAGE_USER; + /* + * Write=0,Dirty=1 PTEs are shadow stack, which the kernel + * shouldn't generally allow access to, but since they + * are already Write=0, the below logic covers both cases. + */ if (write) need_pte_bits |= _PAGE_RW; diff --git a/mm/gup.c b/mm/gup.c index eab18ba045db..e7c7bcc0e268 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -978,7 +978,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) return -EFAULT; if (write) { - if (!(vm_flags & VM_WRITE)) { + if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ From patchwork Sun Mar 19 00:15:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 054B8C7618A for ; Sun, 19 Mar 2023 00:16:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1851E280017; Sat, 18 Mar 2023 20:16:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 02467280001; Sat, 18 Mar 2023 20:16:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C7FB4280017; Sat, 18 Mar 2023 20:16:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 98AA7280001 for ; Sat, 18 Mar 2023 20:16:39 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7A42D120C89 for ; Sun, 19 Mar 2023 00:16:39 +0000 (UTC) X-FDA: 80583731718.24.FD68020 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id 5AB231C001B for ; Sun, 19 Mar 2023 00:16:37 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UsPgNbv9; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184997; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=mwokoOpGillR2xIBR8l2lN8VfHeotneOlmH/z0FhJq4=; b=ySlYwu/fty1tjiBpOwiJZfUFxEaKfojkej5bYhL/lXEn1u8ULZJzZTJfogl1a4Z1xifC3z eVZPIyezvL2oP78y59YV5UrUTiurDFZmVu+xI1JSG8SLlpMyDJsIbgrcUYE6TpSmy31ldu 9y1LxDmmxN1UUo7oVz3HHEGGyqqwbq4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UsPgNbv9; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184997; a=rsa-sha256; cv=none; b=Ys9oDCdMkZavdFK6G/W9DhP9Nlcs+qOn4M5uMWCo4pj9RguIfHixDnSJEQLaGLe/b317O5 tLzbOePic/OD0hjIXmyxtmdfX+U2fWP5I5jA/23JNId+z0ygeWJ0VlV7r1BlRTYpCFvdNW QfMqaHJvV2dC4xztNV2FLnzNYD5LRDA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184997; x=1710720997; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=m2G03CSTQRSNX4YXn65CryKHCN03SDYmmfcTPTkRd8E=; b=UsPgNbv9H5qRBxIxVzow0Gly4GN92clQ/glGgD63xjOMFLoI6LngewtF DZoCWfi2FXGYYXt6kayy6iCYJXHlSG6cOiWxPMWrYlQ323Oz29ikI9IBE pBlRk/j1dJxruSm/RkVFGMWzkwMLp7t5gy2g1V+SEzyzWd5EUhuFzl/tG S2lhODSfl08xgiNIbhrlY07Ec4V4LIVIpMFZ9/DCTvXvYAmlaNzh9EqIB 9VKE1wBqbeSfzTvLN4NART/9p8UISDnc+REu3beQ7bJ9fwdqEMJFq1zKj S540UPSIFTKDl4NKgZO2hFTKDs6j8M628ke4QYpY8oSkrxRkWSTC3FHEe A==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491294" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491294" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672886" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672886" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:34 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 25/40] x86/mm: Introduce MAP_ABOVE4G Date: Sat, 18 Mar 2023 17:15:20 -0700 Message-Id: <20230319001535.23210-26-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 5AB231C001B X-Rspam-User: X-Stat-Signature: n94yhismwq4hbzctiuamsnkd5ijzu8hu X-HE-Tag: 1679184997-259492 X-HE-Meta: U2FsdGVkX1+bTkKtlyyjOkHXgg8iJ+muMNVbE8EDa6v0/Uej8MnE5HY/v3E/EClIgz8PU4iVGN+hOaiZim+EDSrZfcooPJm0cSBN82Qs5UIR2+NvQOHOLuY/P7lZ67bEoLO2w/L4SINYfcxrg21w+/ZHbHuGNqJAUamMkWXAhoSnPUuYCkFoAzWa9Y7OyTaU6K8MqYVUMEu5skd7EG1INyRlNXFrH7VFhDYary22flj+LknMaSOc8YVJTPAOBHP2CNxkSLKADY9VCcvufp4nwYI7zRKDsvms03BCHc4ReNqTuVlPQhzKCNe0foUb102ac8+BDjqo81usGyzqu4mW7uTSVDBi7Wto5DBnOJmBPXL2t481HwxmdlOtniuS3zmbD9th1lMqMKR4ohNLQ5gBzMwLgQSadqgpF7c2Zu5l800e1NAD0fxTAB4keUZj47pDqqo3Kyce4AqFr668iUEZCio5dPFXGMjqQUQ+03evSxGSLxlNn6lDdK5SW/ojpNZEBCbEfhl4VbXLOdYlkScg+1Fe2/6ANFGBZ+QLaXITXaKiPU+ZFWBX77AaSXbrPBFAu2fK6vbIyFzpc3S6r8dggHGoFUQfeRjLOKFGLTsSdnmLindUE8zxFfkqsmn+rZgECuDG+Z/aGxPuNm6YjjQ+GUQoma5Zbw6TbDzGtYKsVXQgs+/duel/bcB3OrVNqNowxQs1pYpkyvbtKd7U+rG+W2qrQjUkqFEsGNHZQMNwx4kGCYBmXLq/Ncu59//PHeck1lmsyLnrXz+xYSFs+o/P/Thfu9/grQICc2MWBC8EA+HPzF288Tz0D60B9jfjAG3lSU1w69PW4yDvxDRYjs4A4imRvDI1jLJ8VB+3dKss9+Gc2csYPCYs6ERt6uQfESM6F65mwUjE0/bMZoW+wx/xCo+fUTxDTqNzUFn57J+UEEKgmrTvvE18oAc5CJs8fX7y9a+KHksOtqC2b8EfvRC 9SKPaiQF lglkTqK47hVy5qlU1zCX39qImJvaYN4OxXIjS+o0ohHD9I93LcdId30oXUwWYHLa+Xr5uBIthyXa4Ncx9dXAqORR6JeS7aZS/AdFZ1xBX4wgUQJOOIFnKBMrclqJmO9qmoHuS0qpDd7hZWCRzx3WOtsqNXPcVaGIcsAcIDl9TpTEikZNV+2hn7I+LaOG0i+p4KKev9JZ8PdJUyBtEc6S8OwKpf1quyhtia+YSi9Q5jkwv1Ebx3Dis/xszT4Dfm4zbs1wY7SPOCf/rc+GI7bezZX7adHYG8APhMo+CBLem5w3atSNNCmVJG2b4GY7AC5roPKfvg93o5rbLlTrB640lDbnztcD8YRkinxM5xXcIsaZsiho1i5ADWgPyevbjhI1epwERgevyzZXRRUYuhhqFh9am1w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which require some core mm changes to function properly. One of the properties is that the shadow stack pointer (SSP), which is a CPU register that points to the shadow stack like the stack pointer points to the stack, can't be pointing outside of the 32 bit address space when the CPU is executing in 32 bit mode. It is desirable to prevent executing in 32 bit mode when shadow stack is enabled because the kernel can't easily support 32 bit signals. On x86 it is possible to transition to 32 bit mode without any special interaction with the kernel, by doing a "far call" to a 32 bit segment. So the shadow stack implementation can use this address space behavior as a feature, by enforcing that shadow stack memory is always mapped outside of the 32 bit address space. This way userspace will trigger a general protection fault which will in turn trigger a segfault if it tries to transition to 32 bit mode with shadow stack enabled. This provides a clean error generating border for the user if they try attempt to do 32 bit mode shadow stack, rather than leave the kernel in a half working state for userspace to be surprised by. So to allow future shadow stack enabling patches to map shadow stacks out of the 32 bit address space, introduce MAP_ABOVE4G. The behavior is pretty much like MAP_32BIT, except that it has the opposite address range. The are a few differences though. If both MAP_32BIT and MAP_ABOVE4G are provided, the kernel will use the MAP_ABOVE4G behavior. Like MAP_32BIT, MAP_ABOVE4G is ignored in a 32 bit syscall. Since the default search behavior is top down, the normal kaslr base can be used for MAP_ABOVE4G. This is unlike MAP_32BIT which has to add its own randomization in the bottom up case. For MAP_32BIT, only the bottom up search path is used. For MAP_ABOVE4G both are potentially valid, so both are used. In the bottomup search path, the default behavior is already consistent with MAP_ABOVE4G since mmap base should be above 4GB. Without MAP_ABOVE4G, the shadow stack will already normally be above 4GB. So without introducing MAP_ABOVE4G, trying to transition to 32 bit mode with shadow stack enabled would usually segfault anyway. This is already pretty decent guard rails. But the addition of MAP_ABOVE4G is some small complexity spent to make it make it more complete. Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) - Use SZ_4G (Boris) v5: - New patch --- arch/x86/include/uapi/asm/mman.h | 1 + arch/x86/kernel/sys_x86_64.c | 6 +++++- include/linux/mman.h | 4 ++++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 775dbd3aff73..5a0256e73f1e 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -3,6 +3,7 @@ #define _ASM_X86_MMAN_H #define MAP_32BIT 0x40 /* only give out 32bit addresses */ +#define MAP_ABOVE4G 0x80 /* only map above 4GB */ #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS #define arch_calc_vm_prot_bits(prot, key) ( \ diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c index 8cc653ffdccd..c783aeb37dce 100644 --- a/arch/x86/kernel/sys_x86_64.c +++ b/arch/x86/kernel/sys_x86_64.c @@ -193,7 +193,11 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, info.flags = VM_UNMAPPED_AREA_TOPDOWN; info.length = len; - info.low_limit = PAGE_SIZE; + if (!in_32bit_syscall() && (flags & MAP_ABOVE4G)) + info.low_limit = SZ_4G; + else + info.low_limit = PAGE_SIZE; + info.high_limit = get_mmap_base(0); /* diff --git a/include/linux/mman.h b/include/linux/mman.h index cee1e4b566d8..40d94411d492 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -15,6 +15,9 @@ #ifndef MAP_32BIT #define MAP_32BIT 0 #endif +#ifndef MAP_ABOVE4G +#define MAP_ABOVE4G 0 +#endif #ifndef MAP_HUGE_2MB #define MAP_HUGE_2MB 0 #endif @@ -50,6 +53,7 @@ | MAP_STACK \ | MAP_HUGETLB \ | MAP_32BIT \ + | MAP_ABOVE4G \ | MAP_HUGE_2MB \ | MAP_HUGE_1GB) From patchwork Sun Mar 19 00:15:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180154 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 661FCC76196 for ; Sun, 19 Mar 2023 00:16:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C864280018; Sat, 18 Mar 2023 20:16:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 74FC0280001; Sat, 18 Mar 2023 20:16:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3586A280018; Sat, 18 Mar 2023 20:16:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 12A82280001 for ; Sat, 18 Mar 2023 20:16:41 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E7C5040C8A for ; Sun, 19 Mar 2023 00:16:40 +0000 (UTC) X-FDA: 80583731760.13.B61E715 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id E262F4000C for ; Sun, 19 Mar 2023 00:16:38 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=F7lDWYYa; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679184999; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=QkKdIBSM6+NW96X7S3iu9foVuWMjrTX9oD9LGR3hCIY=; b=NZKkPhCwh4IGLDKJVS+5+zYfpR7PTOfvore/W4jzeSOce4IcQ4alwUIqIFDoz+sgMZtdac 4hixMNSZ+1DSJCfjNEsrNwUFuYq7JZDDTXARKm0v/ERsb15Tk+9jw4hPqWV4XjzDPejB+Y AW+mLwW+ksA575ZgcUDCpf1z5J3IsYs= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=F7lDWYYa; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679184999; a=rsa-sha256; cv=none; b=5akc2nj2XjBDUcsSty6uK71IUiwnSsTX/xP6d+lCp9/bOBM5oIoXBU8DhQ70nRdjABdR6j aVBQFUWIQVaJ3S4upKzCemYouKsqicOI0sBPYTU5iqa9TIHjjBbSZhga7ApTBo+61TLVDf ff/tEJkc9KQHbJbWnUG/8kBFHNwGiKE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679184998; x=1710720998; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=hsI3rH3pI5zk+4z/rTXM9tpk9smi32a1UvWDoOYm7ZQ=; b=F7lDWYYaUSBbdg6LeJCmjYqebyoNrKsMJLNb7CG4PVJsAymM3BmkrMzO T1lcv2Nv2vCJ62RP2eFB0D3X+oyb84DdwFC8P2D9Qg2B6LOrlbLnm2rvZ M2gVUNEMXzoJK/DYTxyvtM3Fl+n/alyzfVTQ+fQGidB0V5BGhGjBSmPTY znf6xp7EdeOrxPr/HhqSfcbudif8V/BLUUl12lMRtTLo6xZjx0RXzQWw+ C+o5Li7BHyV0dN/Hyu9LBQuUgfA/107JyH9JpFB2mmXQahcREMnaM7IJs vtiiI8DYfsOYECG8+es6TBfCxnc4efqomSYelpb5p8eiieprn7pn15ypp g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491318" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491318" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672895" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672895" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:36 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 26/40] mm: Warn on shadow stack memory in wrong vma Date: Sat, 18 Mar 2023 17:15:21 -0700 Message-Id: <20230319001535.23210-27-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E262F4000C X-Rspam-User: X-Stat-Signature: wftnxgymp53xkhh83s8bw74re6ii14fb X-HE-Tag: 1679184998-564178 X-HE-Meta: U2FsdGVkX19WlYxsAiifssZf8y1d3EL3uMeIrteI606lNeUtGjV+qToPQNJloWmtcl/oD/Sro681OeOG5jspzBC8kXQXuUDL2ZuBLvMFLxZkESyMK+aMG1Czxk3u/QO0h93q+61wVYQY/ViXZoSKPeWmFHsCMeKyuRXfYcsv4+dMpC3Nt34VBtVSuF65A+rIv+Cg8aLU0WmEz28hgCpEXFkWEhDWZ9AmwZS3NJrXYIjbkLjfQ/wR/ORpmK9M15N2shgi+pw7M5G7sRoAQ8UfmTcmvxgRLbKVU+/AX8wg9XO9AgpNZ94XbrRc58W94684u2K5+s5oi8s/yzMKG4i/mRMJNYvBCCEptbsAcZXVYnFSOLU4Az/fQfl9CCN4n1F8keuo7L1/1x+4SiHxeRS54r8j3rpw3PnpVAic6Gjctt0Y8wZCoa95Pel3BmCIH8xZNowO9ZFu9tlwTNXHljZ5iZKVS8wBplTD+iGVTUsiNJKeO1l6iPtTYAXVwVX9nO/q6L2Li79xsPUSmX4/8LEO/ydHQP1quoiT/cmU6Byl3FXp/I4vM2hct7nGaC3kcmqnij3Em7A5eLsnHHWQHkUPHynZI1FrsrXgNYKZjwFnaig6eA7KWVZkCURdOj+wVB6wxlzuyVnUS1DTi5HxuHHzwuNbhbZVjShwE70gvBRzRzV3VY0bHYE1G4mfk0R3GTzQ4UWBNs23W8yNOa7UiGvmh3pRmXFZFWEiWTn3czogAXPuXDfznpieA2Ah1FU1bHVdU+20ntsFWZv0FP8alkBuqziAmHBlROaTLLiivxBloKa2+0FdpbgACkMooSc9KZdAiet6q9jLO7Si00O2kebkCBaxtNOEMkWPTxq20NNux3um9AncYAbCt4bAPbmrkE+XVjEML+wlAkgnhT0rbpGI1VQDnrmbmjEgiOHgxkBNR078khWTXmkCWEvDd9d4WJol/RKVNd2gRxcLi3ijGfB AomZ8eJI nNleQ2ZHQlzmrIAi+SgBvIyF5gMELTSfEkcZ5Ac3I7v7pBEqNmmUunW7lVUx0/8smNlIV9JpuLkbB82nzEwxCPgf5zVWmUEf96gpzpvRUM2/lGl2+wpLiw7TFk5DblmxQrbiJdy6b0SPMZP8b+v/C1VCzaHRkeoU+D93pESgyWCMjQucwJNB5LsXGioKfbIE29Ex4vTso2/0MsU2C2wod4t9+qixRAD8KC7VaVib6akxVOVOjVgdPwpDHqxrOT0XgnpzdkT4qnbXnuH5V9qh/uZLKMVMmxOxIp4aLcAapzHtiN5mJ6LXWBZbmmjGX3puQa6oTtseY5sfruNINDDbSpy3ebi1W/g+wjHQA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The x86 Control-flow Enforcement Technology (CET) feature includes a new type of memory called shadow stack. This shadow stack memory has some unusual properties, which requires some core mm changes to function properly. One sharp edge is that PTEs that are both Write=0 and Dirty=1 are treated as shadow by the CPU, but this combination used to be created by the kernel on x86. Previous patches have changed the kernel to now avoid creating these PTEs unless they are for shadow stack memory. In case any missed corners of the kernel are still creating PTEs like this for non-shadow stack memory, and to catch any re-introductions of the logic, warn if any shadow stack PTEs (Write=0, Dirty=1) are found in non-shadow stack VMAs when they are being zapped. This won't catch transient cases but should have decent coverage. It will be compiled out when shadow stack is not configured. In order to check if a PTE is shadow stack in core mm code, add two arch breakouts arch_check_zapped_pte/pmd(). This will allow shadow stack specific code to be kept in arch/x86. Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook Acked-by: David Hildenbrand --- v8: - Update commit log verbaige (Boris) v6: - Add arch breakout to remove shstk from core MM code. v5: - Fix typo in commit log v3: - New patch --- arch/x86/include/asm/pgtable.h | 6 ++++++ arch/x86/mm/pgtable.c | 12 ++++++++++++ include/linux/pgtable.h | 14 ++++++++++++++ mm/huge_memory.c | 1 + mm/memory.c | 1 + 5 files changed, 34 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 2e3d8cca1195..e5b3dce0d9fe 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1684,6 +1684,12 @@ static inline bool arch_has_hw_pte_young(void) return true; } +#define arch_check_zapped_pte arch_check_zapped_pte +void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte); + +#define arch_check_zapped_pmd arch_check_zapped_pmd +void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd); + #ifdef CONFIG_XEN_PV #define arch_has_hw_nonleaf_pmd_young arch_has_hw_nonleaf_pmd_young static inline bool arch_has_hw_nonleaf_pmd_young(void) diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 98856bcc8102..afab0bc7862b 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -906,3 +906,15 @@ pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd; } + +void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte) +{ + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pte_shstk(pte)); +} + +void arch_check_zapped_pmd(struct vm_area_struct *vma, pmd_t pmd) +{ + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_SHADOW_STACK) && + pmd_shstk(pmd)); +} diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c63cd44777ec..4a8970b9fb11 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -291,6 +291,20 @@ static inline bool arch_has_hw_pte_young(void) } #endif +#ifndef arch_check_zapped_pte +static inline void arch_check_zapped_pte(struct vm_area_struct *vma, + pte_t pte) +{ +} +#endif + +#ifndef arch_check_zapped_pmd +static inline void arch_check_zapped_pmd(struct vm_area_struct *vma, + pmd_t pmd) +{ +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index aaf815838144..24797be05fcb 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1689,6 +1689,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, */ orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd, tlb->fullmm); + arch_check_zapped_pmd(vma, orig_pmd); tlb_remove_pmd_tlb_entry(tlb, pmd, addr); if (vma_is_special_huge(vma)) { if (arch_needs_pgtable_deposit()) diff --git a/mm/memory.c b/mm/memory.c index d0972d2d6f36..c953c2c4588c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1389,6 +1389,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); + arch_check_zapped_pte(vma, ptent); tlb_remove_tlb_entry(tlb, pte, addr); zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); From patchwork Sun Mar 19 00:15:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180155 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEB56C74A5B for ; Sun, 19 Mar 2023 00:16:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A6A6280019; Sat, 18 Mar 2023 20:16:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E1C9280001; Sat, 18 Mar 2023 20:16:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E61A280019; Sat, 18 Mar 2023 20:16:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CC9EA280001 for ; Sat, 18 Mar 2023 20:16:42 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AFACEA0759 for ; Sun, 19 Mar 2023 00:16:42 +0000 (UTC) X-FDA: 80583731844.02.BC9DA02 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id B244C1C0014 for ; Sun, 19 Mar 2023 00:16:40 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CeRlmLch; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185001; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=hckJnny/Q5UIHPZn1YqPA8ZhgyJI/JYSq0aVaLRSpaE=; b=DU6RNm3OSpOzA3QlOfGVvfVpUC+PUnhW3Tyo7WJeXWqeb7WYFdnTY+MKo2Zh8IRZtQhHrb ENfugvaqyHJTlChvS6T4Id5CWe21nB9LJhiE2xKW2WT9lJf3RBoUJZ7sa/1hLVwiVDGbC7 a7g0UFfrNCWXtWHBcuJku3z7j2qyIpk= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CeRlmLch; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185001; a=rsa-sha256; cv=none; b=EWjdjJYKvjHOIdvGb/AT4WPN/WGm9Kltx3HiXGkbOzGIJrlb4WNXRGqHnWYrJAtRNmw8Bc Grr38hIHyxbn5xzjpFnpXgPcigVbb8rEoqc+923Low7FPLU10rVHMJBjyJJEw8h8DQ+qwP Zff4N5p8R4sX51l1PqQjSLo6VQiZkdM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185000; x=1710721000; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=5Gp9o9+xrqpZdB7LgH7hV6GLcbn71BAvzGNiD7RQW5E=; b=CeRlmLchfbDqHVsdhF/UWAQZg+wHXzC2/w2pmUsP5SncUz/hBgGqduoO DdfWf2LWiTDL8QZ3FkdeHVxCz6eIG0LOHH7FsQrWrYays1v5f46YnxF3x vK5yyxbWUcaUHGM8JRqT/qjbje6JNdkgG+CFp57uXWh8ZFl6uquHypur9 VpEIL5dv0HahhHK3pitGko6FTAej3POlyWG1jx05BZowB5E/4Znxjw8ls bJ8Qr1uhpX0KNzNZLLtT/wm/pFVdk+HzI6wIpHxhLwb30lLrTymN/uAXj OOtkx22FtMJF8YDNs67YDpQ6wLfxTmZm4Nqpw/e/hRJlhpjOY8psuU5At g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491340" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491340" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672901" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672901" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:38 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 27/40] x86/mm: Warn if create Write=0,Dirty=1 with raw prot Date: Sat, 18 Mar 2023 17:15:22 -0700 Message-Id: <20230319001535.23210-28-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B244C1C0014 X-Rspam-User: X-Stat-Signature: sjdrbwkxkzgsuq3w1p6m4zizhuz5i1sc X-HE-Tag: 1679185000-617891 X-HE-Meta: U2FsdGVkX1+DKBOZV8KfrUp7q0ys8GI6QFebqqwNIB9lM6IEm2vL4nFDcbsAkZOfcexhIXfHsmL5xIq/ON/Y9fNH/gdHgPl7lnM0CQFioaIue1vEK+7bu2D2xlqlEmYflaTA1JNZ9qSocmFdbFv6sReqhDWmm+UJz3r3HEX4LnUREMArAKGbSqAj7V+Wj71K5nd+jEOasiHZL3Srx6HiBJ7Dx5k1ePVMAS/4SDngxxgFSyb4kH5JJYVv7YGQqkkN7jBXBILeIk0p6MoS+HnJbNdhn5aEJF5t36gvGYkU93LAbHwnQsCPvhGUuc7U8GE1CI3U/0pbYJ/1BBflZOJsHZCSR34r+f/a/gFGsA5G3krCkRcoN8ALbIctJXuKldHKennQUOY+hapWXJICDuCzQ+iLJ22/D9gSKjX5ipJEs03r3YMamJHbPg3mudmWJ4sLJ4dnkI4ASQFhiuyUQdNVsWEP5/yShJIyYDMtiOOLtiRSAZfMksrzhsFDHM4JpNVNV+DctP9tlaiMvP+kD+e6va5C2WdoKErQyamQX8IPUoDE2o/zQj+Alw1p31AXla60X06JOUjv9z8X4OJyQEgjfFNC40QMpqgV0MABu9jQ0hDwjZjR5e8vPHm2wIVDMJ9nelfXZDB2pVsD8mrlGxPOFxvEkL6p4r8FEAOChjEAcGN71xhG12fiji50RDGya+6QTNvFIA8k6NRUrX2ePkFDwwFTVs8yFQuBR1MdHXksmLteNVDHlruy9rplHrOGtsX5HxSmdG9hT/uqs6A80YEDKu57eiGOTEQTEbmcBAqm+kYEXDv0+xBudhnHXbbVPK8n0x+1jF2E4UJxVkjLTNej+ODEPIIrRFGsEKStBZm+eHHLVZvbhkpb3UEVYohvjRp3fIwG8+Jg2JN4WzN3KBAV4AGY5RnnVZJ5zcFpl+ECm9hKMCn0SkdL6396jsz/F/t5dvNrBKbzFWDdWgaEs67 jecEjNJU t0s+HnlORuNBmC4xU3J+FSZx9oenZDWnkLjoj5lROf+Jn2G6xvsoED+CW+qxPdBEJvEsRBT0/1Wl5zTzT4Rmpe2aPuA4AbmE+q4+FFx7KJeNqNG16zmSCCHvkzkYLqeXM/y4LIpC+ERMfzxDETZ8oc9p+EhI010T69ngQVL+BWXLFj9ntsqQwtkmJrJ8VHRbM5zl+xqQakWpAYbJfCidDLF8LJbnxuL+SQTdBK11IX6rjXwUxYioW4bk65eRQFYs8i+aY1JxZDxr7HtaNaoLBhIjo+jCLaKrjzcEFbGmCU9Z2d/J2xXClu5goN1H1a9CXwsnwx81O71G1xGS7fZOdmOzYEsCqmkxtf1Cb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When user shadow stack is in use, Write=0,Dirty=1 is treated by the CPU as shadow stack memory. So for shadow stack memory this bit combination is valid, but when Dirty=1,Write=1 (conventionally writable) memory is being write protected, the kernel has been taught to transition the Dirty=1 bit to SavedDirty=1, to avoid inadvertently creating shadow stack memory. It does this inside pte_wrprotect() because it knows the PTE is not intended to be a writable shadow stack entry, it is supposed to be write protected. However, when a PTE is created by a raw prot using mk_pte(), mk_pte() can't know whether to adjust Dirty=1 to SavedDirty=1. It can't distinguish between the caller intending to create a shadow stack PTE or needing the SavedDirty shift. The kernel has been updated to not do this, and so Write=0,Dirty=1 memory should only be created by the pte_mkfoo() helpers. Add a warning to make sure no new mk_pte() start doing this, like, for example, set_memory_rox() did. Signed-off-by: Rick Edgecombe Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) v6: - New patch (Note, this has already been a useful warning, it caught the newly added set_memory_rox() doing this) --- arch/x86/include/asm/pgtable.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e5b3dce0d9fe..7142f99d3fbb 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1032,7 +1032,15 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) * (Currently stuck as a macro because of indirect forward reference * to linux/mm.h:page_to_nid()) */ -#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot)) +#define mk_pte(page, pgprot) \ +({ \ + pgprot_t __pgprot = pgprot; \ + \ + WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_USER_SHSTK) && \ + (pgprot_val(__pgprot) & (_PAGE_DIRTY | _PAGE_RW)) == \ + _PAGE_DIRTY); \ + pfn_pte(page_to_pfn(page), __pgprot); \ +}) static inline int pmd_bad(pmd_t pmd) { From patchwork Sun Mar 19 00:15:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180156 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13A8AC7619A for ; Sun, 19 Mar 2023 00:16:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE42528001A; Sat, 18 Mar 2023 20:16:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B6CC7280001; Sat, 18 Mar 2023 20:16:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 972B128001A; Sat, 18 Mar 2023 20:16:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 80217280001 for ; Sat, 18 Mar 2023 20:16:44 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 550E0A02F7 for ; Sun, 19 Mar 2023 00:16:44 +0000 (UTC) X-FDA: 80583731928.18.8F5CEF9 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id 49DFA4000C for ; Sun, 19 Mar 2023 00:16:42 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JnW9iOcv; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185002; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=9CimsSkcnB1SixWm79FtbmcEJmTxe2uhP+S9LsdP8OM=; b=iuybNlMT7AQ7SgnsyZyaOhsDHGsCoHXnAhU9NizpbfLRAvymWTOmOPQiiO9NsppKIbm+yq iNBwiA/vH1Rl1oJnDdhetEc49PcUrVuzAkDRI3xmIyTShaqUGlvZYrbyd/V2CWCexeLl7S 1pSQEeIJxJtJkrJ0gbwnYMCGVwhKtjE= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JnW9iOcv; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185002; a=rsa-sha256; cv=none; b=NaP8haTVUuK6+NLE/eSuib/ymClWW9asKibj7pa/cpM623BnTIaCQLLHxRht/zRZjuEUge igBxrIsyHU+uL4GKxLctub7jKiwmbP3XCJA4APrUs9qEyuU4c1cKS8hlO0y7knEhdtL9gJ d1Lh1gi5I3PfvpF5nKpTKRsULNC1fOM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185002; x=1710721002; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=0ExKgIRjskyaMqTzivet8rEqvydaMzUAlH3acHHtrn0=; b=JnW9iOcvLm57iUZVsIUvNUT0UDEmLJb7/0IIGjFJvDmpQ9xHrvW6nyqf VUYxFTD/LneSq1iwwu06qvFmHmxe9gCQ68E/Musej3TtdfF6HEADo5+x1 8q+LgwpttVe0GgEaC3q44YWcTYKI10i4sn7yHZDoqVxGg+ndZwro1dFoi rc58VSmDwAGxqWiLkL+ZauyNA/9AXwPAAZEi3/Idy3BR0bOLyFkCj+1VF GZpTfQr9FLrGQORR3nrLfygzJKX3TtMRzw6oCVh0poSZltCNM7iak+TD/ bYnK/VzLBLGdvmzDXFbcke9saAsYHd0zkA/+GTRtSPoYQO+/9I38Ao54R w==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491363" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491363" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672914" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672914" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:39 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 28/40] x86: Introduce userspace API for shadow stack Date: Sat, 18 Mar 2023 17:15:23 -0700 Message-Id: <20230319001535.23210-29-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 49DFA4000C X-Rspam-User: X-Stat-Signature: 65txgkyhe86h6ru54xyy3o5pmxj8fy81 X-HE-Tag: 1679185002-29150 X-HE-Meta: U2FsdGVkX1/WaEVLL3nTArHr5HhirC7suNJoOSfZM4bFX18T6tgKef03AQCcqUncCwky+Mzg6CPSoZFw014Dt+cbGpihPS3QUGKaA62odLLy/5gY+iVNHsgHpGRFlOBCVEP0fhCdWqLYkYWezspyn5fVuo7m6zRY6qDVpJN4IIiYP6cIc3+l5DDfFOJFkdx1Tb2WEWgMi00ygCE/350gLJBkWkwowhc+lQvX6zT/oWZGjsM7RoU2AgnJ1Tuurcjum7jsL3HLkx3YA7Dkib69bj/iNwy63f6jnWUQhJBK3ZynZ8+LgeMK9dsbxq+ncLsxvX3FkF2uJ5lG89s5C7saqB+2GZ5uMyBza57Yo+xLJFpsZ0CiLqQAoozKs8Czdr2un5PLxi1EjIyfO9omIzLsFAjIEiL5tiYe6/VEIAxwodt1D4EVioQv39NPwQvOIgnFOx5GEuuwHBmk3BpsckTNHg1xjQ6rKKXqy6mPCrv2sNg7U24zkVfLxz6wja61+XqjGN5pfssCTwjc4kQyimWI5jmr3yrJcNZTBLC0ZCUhdFgLkezl/LTJHSEFX3d7EBWaKgi/nWVYnY9jLW62lpW+fbdkDqWF5jsPB0O2JFGJYmrQ8Vfckqj1xudS0Miph+QCp3SbT51xmiDvHw+N3xgrJ7/a+52vIqcVcC21MjO5RZtsOXnDxsbR8q57WfG58+i1/CxZQBbrqQdHpDaYi6jyDUtkcs41/hy1WyZGiAvBc+sjx32JjXtapSiCW23PWZIRwQ2DToRzm9UQJJA4wG68X9tJxHEVa1KjoPIkEJJPANg96Wq8VZ4VB8Tsvf6bPAxp5Co5OsWqccjuT1wgPRTF839U3+YNZf8OBNlLry/TUcW7j5x7zxbaEwi1hBb/fvletO5uvHBXgMWepbSbMEfJlzYmlcg2VawFj0YIEhYGsvK4mrcggpB6jik5aNh/JP7vmNYmXJdku1TiRI1vA8R 2Kbl5U1M 7eVvzq3yYhYz6PQ076cj3g2QF8etzsSIxKYnidoNjQHWWzQylwuyPDPKcLEV8VuxPQGmwyyqc9FyACj1Mk0yQRfdkqFIVvc4W3MsL1SKbSZQgI2tHsN2hb6LmKiIS3hXhJoLmALpa/KERFeV6qy9VEVq+rfaj08HG04z+m/R0xG8Rew1vKhYl4yOqkxby0lkiQfCr6yYepJTQQhpZpEcDK23n0ssSYFkZJxFowRp9c6JlldOhwQDyG8Y/uIcfGgR15QTsKYF4htlPhDfZ9Roq1TqA5sa1W8DUjqC+3hnGWMOSwUSZh9EXTT0joZu1qi8cH3lmCix0PdfpRuKwHlr799aPSBt66ABMmVya X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add three new arch_prctl() handles: - ARCH_SHSTK_ENABLE/DISABLE enables or disables the specified feature. Returns 0 on success or a negative value on error. - ARCH_SHSTK_LOCK prevents future disabling or enabling of the specified feature. Returns 0 on success or a negative value on error. The features are handled per-thread and inherited over fork(2)/clone(2), but reset on exec(). Co-developed-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- This is preparation patch. It does not implement any features. The {} are to make the diff in a future patch cleaner. v8: - Update commit log verbiage (Boris) v4: - Remove references to CET and replace with shadow stack (Peterz) v3: - Move shstk.c Makefile changes earlier (Kees) - Add #ifdef around features_locked and features (Kees) - Encapsulate features reset earlier in reset_thread_features() so features and features_locked are not referenced in code that would be compiled !CONFIG_X86_USER_SHADOW_STACK. (Kees) - Fix typo in commit log (Kees) - Switch arch_prctl() numbers to avoid conflict with LAM v2: - Only allow one enable/disable per call (tglx) - Return error code like a normal arch_prctl() (Alexander Potapenko) - Make CET only (tglx) --- arch/x86/include/asm/processor.h | 6 +++++ arch/x86/include/asm/shstk.h | 21 +++++++++++++++ arch/x86/include/uapi/asm/prctl.h | 6 +++++ arch/x86/kernel/Makefile | 2 ++ arch/x86/kernel/process_64.c | 7 ++++- arch/x86/kernel/shstk.c | 44 +++++++++++++++++++++++++++++++ 6 files changed, 85 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/shstk.h create mode 100644 arch/x86/kernel/shstk.c diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 8d73004e4cac..bd16e012b3e9 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -28,6 +28,7 @@ struct vm86; #include #include #include +#include #include #include @@ -475,6 +476,11 @@ struct thread_struct { */ u32 pkru; +#ifdef CONFIG_X86_USER_SHADOW_STACK + unsigned long features; + unsigned long features_locked; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h new file mode 100644 index 000000000000..ec753809f074 --- /dev/null +++ b/arch/x86/include/asm/shstk.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SHSTK_H +#define _ASM_X86_SHSTK_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +long shstk_prctl(struct task_struct *task, int option, unsigned long features); +void reset_thread_features(void); +#else +static inline long shstk_prctl(struct task_struct *task, int option, + unsigned long arg2) { return -EINVAL; } +static inline void reset_thread_features(void) {} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_SHSTK_H */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 500b96e71f18..b2b3b7200b2d 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -20,4 +20,10 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +/* Don't use 0x3001-0x3004 because of old glibcs */ + +#define ARCH_SHSTK_ENABLE 0x5001 +#define ARCH_SHSTK_DISABLE 0x5002 +#define ARCH_SHSTK_LOCK 0x5003 + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 92446f1dedd7..b366641703e3 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -146,6 +146,8 @@ obj-$(CONFIG_CALL_THUNKS) += callthunks.o obj-$(CONFIG_X86_CET) += cet.o +obj-$(CONFIG_X86_USER_SHADOW_STACK) += shstk.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index bb65a68b4b49..9bbad1763e33 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -514,6 +514,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, load_gs_index(__USER_DS); } + reset_thread_features(); + loadsegment(fs, 0); loadsegment(es, _ds); loadsegment(ds, _ds); @@ -830,7 +832,10 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_MAP_VDSO_64: return prctl_map_vdso(&vdso_image_64, arg2); #endif - + case ARCH_SHSTK_ENABLE: + case ARCH_SHSTK_DISABLE: + case ARCH_SHSTK_LOCK: + return shstk_prctl(task, option, arg2); default: ret = -EINVAL; break; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c new file mode 100644 index 000000000000..41ed6552e0a5 --- /dev/null +++ b/arch/x86/kernel/shstk.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * shstk.c - Intel shadow stack support + * + * Copyright (c) 2021, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include + +void reset_thread_features(void) +{ + current->thread.features = 0; + current->thread.features_locked = 0; +} + +long shstk_prctl(struct task_struct *task, int option, unsigned long features) +{ + if (option == ARCH_SHSTK_LOCK) { + task->thread.features_locked |= features; + return 0; + } + + /* Don't allow via ptrace */ + if (task != current) + return -EINVAL; + + /* Do not allow to change locked features */ + if (features & task->thread.features_locked) + return -EPERM; + + /* Only support enabling/disabling one feature at a time. */ + if (hweight_long(features) > 1) + return -EINVAL; + + if (option == ARCH_SHSTK_DISABLE) { + return -EINVAL; + } + + /* Handle ARCH_SHSTK_ENABLE */ + return -EINVAL; +} From patchwork Sun Mar 19 00:15:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180157 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66CA4C76196 for ; Sun, 19 Mar 2023 00:16:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0258028001B; Sat, 18 Mar 2023 20:16:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E5563280001; Sat, 18 Mar 2023 20:16:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C103A28001B; Sat, 18 Mar 2023 20:16:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 85C0D280001 for ; Sat, 18 Mar 2023 20:16:46 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5A6B880CAD for ; Sun, 19 Mar 2023 00:16:46 +0000 (UTC) X-FDA: 80583732012.09.4047C83 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id 20E8E1C0018 for ; Sun, 19 Mar 2023 00:16:43 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ag6yogTs; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185004; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=o3Rz70ck6nM+CTFRpD6v5ZMhHD7aV88XHo5GuS7VB1o=; b=XBSAvkrZ4tL6ZsXCF8sxm1aIZWkLr/UoKo90ddLQLm3a3l/r1Xzgen4vD8Ress5TuAK7s4 7zo0jP9BQwpFsxLIE1+TauxwXTJzJVqBvKlXVCfs3VaqLGSGGpRSyykpFkIUnSHBvIfYtq OIRnx6hKPHcRTQ+zYqoQDQUQdaE0Pkk= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ag6yogTs; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185004; a=rsa-sha256; cv=none; b=SxTa7BvVtsZcVYCvoHzsGAs4JrawlqPXLT+haCXSr69nH4IepRn/dZjqR/GKRlzMcZQcX4 oN5hcAPhg78vszVouhQHi9g/LV6kNl3n1TUIqTvVZsiPUNYLYhnszb67AEJkOT3zd6nyW0 M7yFvrA03uZkURPSBoLDljxuM1ZIVig= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185004; x=1710721004; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=qFnC8BjwG9ajYhpSUihMC4CLPCFBVSbSaq8Aw+3zJPY=; b=ag6yogTsZbZuW1DUHkTjXOpJzuDAXCL5LGr6xf4PayfzMNWj9cjpB1R0 31m2mP+oKhCEf+MxfysvRj+plueAypUFz8C7g7XoQd71TyPS/sGmgjNK5 wHZAlujV0DcA+gaJNk1FlEz4KV3uf2crqCtnvvvWEmbJAHyBUxc7rv9oR 2iMQy/x1/gDMalKsElXGuXbBeQLaPtFaGJdvvbe7X61LoMn9XmrzI3LzI Phfeq0GwqHsb53uVlJwZZAR6XPcC5vdAw2Qkzmni6asmd4kFsUv53FQus uPvozB5ERjoIiMFapkTU0kkpjKv1M62rDM7KpxiaFQj0c2xUqYsH77YL7 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491387" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491387" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672927" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672927" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:41 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 29/40] x86/shstk: Add user-mode shadow stack support Date: Sat, 18 Mar 2023 17:15:24 -0700 Message-Id: <20230319001535.23210-30-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 20E8E1C0018 X-Rspam-User: X-Stat-Signature: ict4xjdqrmsxdqwu9bu4z36pjh674uc4 X-HE-Tag: 1679185003-601331 X-HE-Meta: U2FsdGVkX1+S2FBELCi2hexOd5OIDqngz8dG1ReUawVDBfBtViEBPxIpisXHDC0Vr7QkdXx8K0MIOj+sAy80qK3m3TlVgByLCy1SwGAdyUYeaeBmrwjHw4VBcmaQrvC9DQ1fui3QJVLL953m33Qiaw4EFweMgDaPMmCr1YYWafUkSuDJxpFQkH7w0yTwo3lyfDqNlX19PW0kGMapfPB/jcA60iIJdSOoDgcdHTgqRzMVGZb5NU1GTG85I2xBiZSi70dweTSCHIOyzeY/Vlzk6ekVso4Q4ZBk4ENFF6jfWFHkMCF38NILAzA/2FcSMm6PpP212npN4iDuWwvPSKDFj6Z/vF6IFzwvQkyGvDanKexsrvo4/Zzuj6jMKlSNvM+9ofyUnLOoOg96GhdCa0isJq/o5ouLKmNvrIc/llRbkz37fuhC4+4Wv97qK0jylC0YQT2oaM/ioPc6i5AXXAJ88DK2FpKnwEW3Kvkh3PGpDDzDtfqfv5tsTmnkbYtmGnXX5c8Br8hgSTnPrUGNym19XeXvalZ8IvNoOKnYoCtckZKBIBAdzK3IBA2mtb7sa9DBb0CXA+M5f7od1DDdTni82Rr8J1TFf829kqr1nkZUS/mNZnYATmT4XIrUg43SuPd8fL39u/JMNcPYNME5CetU0aCVHOpGz/POnunUj7XgcqhC1J3BnExF+Id7snIyE2+xraO7WSAlS4+ryGB/rZvDdSFOwrY3jlvYklfuf/5mkOrvGk/KtRsf7BWbZZOtHfna7UsMdi0Hi8r9TXYveteCV0QiOh5Jh8/GAddpL1jFozx6vyWJMZp5jfLRSGKRV6SDvVUNClLOLsAeYcgG5wydvpswjs5OyIW2Diwo9BrvviarLloj6siOY04NNiYYAXXUDtI2kRUUAro8t5zWpo0ibQpqj72jW9ixf1m/Daurs+JL0NTXsWx0UPt9c/GUt3hIxy6cbMfGjid/Jlutj9B yAjEyNPH DkWclCvXRr81lRthA4sl6Ucs7nGvaLtYz4nTUrfa01os2y1Wnu9gXtguBongTVnIZH9Vy8oIyxQtRcBb8xjsQiHx4h+GPf0evb975T1gusA9bBRao51pNTtJ7jZuOVoo21hl6Kl6jyeaX5PYNKwPCiv2CJFMVdllmmmvJsTbT9URG6S45wGp7HT//QJYGaWZjdk5F2zN/qbcXE39SaDUX5aBrWpkihMGDkWiDsgzxMEW0hjZpEMSX8qncg9JvhR1Khahzlxyx1JHbNW2kyh6HyxI2wcp76+pR43EtuLKTOhRYC4UPGa7VXbzoh7CtcQ+PFjHzde4PQN2gSOSdXOaW3YtE8rUq7APE8GxA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Keep the task's shadow stack address and size in thread_struct. This will be copied when cloning new threads, but needs to be cleared during exec, so add a function to do this. 32 bit shadow stack is not expected to have many users and it will complicate the signal implementation. So do not support IA32 emulation or x32. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v7: - Add explanation for not supporting 32 bit in commit log (Boris) v5: - Switch to EOPNOTSUPP - Use MAP_ABOVE4G - Move set_clr_bits_msrl() to patch where it is first used v4: - Just set MSR_IA32_U_CET when disabling shadow stack, since we don't have IBT yet. (Peterz) v3: - Use define for set_clr_bits_msrl() (Kees) - Make some functions static (Kees) - Change feature_foo() to features_foo() (Kees) - Centralize shadow stack size rlimit checks (Kees) - Disable x32 support --- arch/x86/include/asm/processor.h | 2 + arch/x86/include/asm/shstk.h | 7 ++ arch/x86/include/uapi/asm/prctl.h | 3 + arch/x86/kernel/shstk.c | 145 ++++++++++++++++++++++++++++++ 4 files changed, 157 insertions(+) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index bd16e012b3e9..ff98cd6d5af2 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -479,6 +479,8 @@ struct thread_struct { #ifdef CONFIG_X86_USER_SHADOW_STACK unsigned long features; unsigned long features_locked; + + struct thread_shstk shstk; #endif /* Floating point and extended processor state */ diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index ec753809f074..2b1f7c9b9995 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -8,12 +8,19 @@ struct task_struct; #ifdef CONFIG_X86_USER_SHADOW_STACK +struct thread_shstk { + u64 base; + u64 size; +}; + long shstk_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +void shstk_free(struct task_struct *p); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index b2b3b7200b2d..7dfd9dc00509 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -26,4 +26,7 @@ #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 +/* ARCH_SHSTK_ features bits */ +#define ARCH_SHSTK_SHSTK (1ULL << 0) + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 41ed6552e0a5..3cb85224d856 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -8,14 +8,159 @@ #include #include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #include +static bool features_enabled(unsigned long features) +{ + return current->thread.features & features; +} + +static void features_set(unsigned long features) +{ + current->thread.features |= features; +} + +static void features_clr(unsigned long features) +{ + current->thread.features &= ~features; +} + +static unsigned long alloc_shstk(unsigned long size) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; + struct mm_struct *mm = current->mm; + unsigned long addr, unused; + + mmap_write_lock(mm); + addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + + mmap_write_unlock(mm); + + return addr; +} + +static unsigned long adjust_shstk_size(unsigned long size) +{ + if (size) + return PAGE_ALIGN(size); + + return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); +} + +static void unmap_shadow_stack(u64 base, u64 size) +{ + while (1) { + int r; + + r = vm_munmap(base, size); + + /* + * vm_munmap() returns -EINTR when mmap_lock is held by + * something else, and that lock should not be held for a + * long time. Retry it for the case. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + /* + * For all other types of vm_munmap() failure, either the + * system is out of memory or there is bug. + */ + WARN_ON_ONCE(r); + break; + } +} + +static int shstk_setup(void) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + unsigned long addr, size; + + /* Already enabled */ + if (features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + /* Also not supported for 32 bit and x32 */ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || in_32bit_syscall()) + return -EOPNOTSUPP; + + size = adjust_shstk_size(0); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, addr + size); + wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN); + fpregs_unlock(); + + shstk->base = addr; + shstk->size = size; + features_set(ARCH_SHSTK_SHSTK); + + return 0; +} + void reset_thread_features(void) { + memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); current->thread.features = 0; current->thread.features_locked = 0; } +void shstk_free(struct task_struct *tsk) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return; + + if (!tsk->mm) + return; + + unmap_shadow_stack(shstk->base, shstk->size); +} + +static int shstk_disable(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* Already disabled? */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + fpregs_lock_and_load(); + /* Disable WRSS too when disabling shadow stack */ + wrmsrl(MSR_IA32_U_CET, 0); + wrmsrl(MSR_IA32_PL3_SSP, 0); + fpregs_unlock(); + + shstk_free(current); + features_clr(ARCH_SHSTK_SHSTK); + + return 0; +} + long shstk_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_SHSTK_LOCK) { From patchwork Sun Mar 19 00:15:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180158 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E45A0C74A5B for ; Sun, 19 Mar 2023 00:16:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F58E28001C; Sat, 18 Mar 2023 20:16:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 37F46280001; Sat, 18 Mar 2023 20:16:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F7B628001C; Sat, 18 Mar 2023 20:16:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0A473280001 for ; Sat, 18 Mar 2023 20:16:48 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D0150A0759 for ; Sun, 19 Mar 2023 00:16:47 +0000 (UTC) X-FDA: 80583732054.18.6CFD378 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id B4BEA4000E for ; Sun, 19 Mar 2023 00:16:45 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jvmxjvYS; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185006; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=W0ftUG5PHbqga3JJcvdLhbXeuZS1VD+kWPs82boa/+k=; b=Fuqh9L+FC6VLu9XH+McXuQ6aw1Z1hBhYLDfRdO6KFL/jtn0akWZDTAibal7VBB939Q9VBu VrhYDK+ToZPonGw1JvYQV/r/+xOVYbaA7nSMJG2uVmLfcTXc4rbt4D0tk8cJ/Hwj55OzrU 7vpko770mJj0geKT3BLtz0PSRbS7Cok= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=jvmxjvYS; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185006; a=rsa-sha256; cv=none; b=6MeaWtEKhNjuxsi4uuxioKpqt+jXe0IJA5PnhpXg8WfhwGBo0EI9R21w9Wj1uE962D37cb +W0RFPedqWrHBOwdop97IRBEO3C7NzxWqmE7+dwVED3xEynD/8UJ3DNz49Be+TWlvUZAvj NSU+KvRhoVQsdQesJTi1KFrgPjfpz3Y= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185005; x=1710721005; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=/W3M7babRpwbvz/5H38tZ0gNNiq3szmBJJYgOrNIixQ=; b=jvmxjvYScoXLChO1g5cnGuLoVSrZZIdCnodbtENbkeUSkjA+ZEflwm8R 78+c166L3NNO4xvplyJ2EamqdoNXSonxx1k/x4dzxEZNOR5CMgagmN+lK EFNfo9mF8wCkv9KYWo+hqGVmCJPLRLZpDJmzQ5YAFtbQCgrnt8qQzvaYi Zl3jlDVaVBCAp3KwRRV5yr0DlaSRAB8I+67NANqqJIPqp0PDJp8bUcLpZ vtCSyfSHtu+3nvq4qeBSdu38lLNbcRBfRnAkg48XMK9kNFZvuq0ivi3Fn U72tdnY88oDYIfoTXPQB76Qd8DNg0Jr+6BuzqGbLmAow8kVMbqcJvw7vy A==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491410" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491410" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672937" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672937" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:43 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 30/40] x86/shstk: Handle thread shadow stack Date: Sat, 18 Mar 2023 17:15:25 -0700 Message-Id: <20230319001535.23210-31-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B4BEA4000E X-Rspam-User: X-Stat-Signature: u8kdcezgmtwzkn8gknhne7hjrccnbif8 X-HE-Tag: 1679185005-556977 X-HE-Meta: U2FsdGVkX1/UPaB3g8ayosYWvIcqbBl/wcsA0gOX5ln6R53oKDFu9JEBKl4iiePpd8euKAOgaKKE1ORPKZwVP6T3nzd/+d38vuMQfoNlXWanMBED2M1egRJM7Z/iVyaqjhWaY8vy+pvgSV5na0tZM1i399qXsqJIrHvBftUDDQ4CjgTB5n4+Jkr+sE7tFUF1nHvI4AxN/2vdBn1rKqLEUl2XrISpDxgHPE5T9whspqz/yMBsqXm2ET4lamRcZsB0iaLGkFPfWN4Qb5/wVDP/ETqsgzUKeHqoSbCrPjbOZivQr5Rgpre8GfOC83QFn29Qqakl5gKY0gRWQYZkXcDEJLCpftg5zmh2y6A7aEIDTP3e22vYyMKJW5/9ffB7WVWZxpuOcqEyl+cIr8pGsA8HMMQhykMbhz7iPvAYzPtaDiyooraw9pcZEv2A+odVbSD8HB7iKnEudprhRe3o8wrMlntH6LSnYU4kFyQibdDY7IsA+1hbAuxwv4aZ+dcuftOfhl+GkVyj4qZCZ9lfSfHeZBigkc9lhQ4o7NB0+QVUVnohvN5BlKf8UG5XsJNr2/kdqUA4qrqlacpqvyXl86Erf/FsOGJD3VYXTciksbOpfO3D1w3sIvAcMeq8DQQKWTEzBnSWSOMNKA3c9vNGCDkY1yf6CiiHUXI0wEAvUXM01cDz6b98AZy6IOnTD3Z8PUYlxWwar4s3y4S5i5F5UCxLznEI6TgVNaDSun0YWzF8RzoKcHflgpVrTn/TJl7+oST4lMkglK0Mci3ftDD+vLcYn541CPydciiiEN26DEY132ZoVSwu0NfDYBY5JDFO2Q+DZliKgmKRoJb3oxfibWXrlG9BgSwSl0pS49vW30nQYu9wg6dncoB0/QrxkWVra/ZMXZLzEL1ogftXkgPZudJ8kP0mArz+lA94+8Fp2jZNycZgH14L7sie4fRlTbnaJdf6ME860jT3UdB1t0IJoLM r5xm6rva 4xAnuTuhPSLFQOsX5OVEWLiGiDgW5ZJQD2luEFkuymTZ48kR6j7IKKEo+88H440E9OgX7x5tF5l+I/54dsOcJ/mRwuDlUmKUG5yQWZmIVDx9DU1dRkGFoSM/rTSrlPgqC5FJRgCG7yFnN5VQFupCpk63K6UznWPwzEk+dV7wY+thQChRrt0VvUFMz3oMrncQpw/mbCtR0nX0Ko/Llc6sBIeN7XZwdWJg23cCOhIxgMatQKKLeK0zbtV45xvsDRmHM0CMRydvsiLw02PzDoqewPnUT6Y9ZcFEzOxrhATS15XQ2gLo1Wa4J23Kn+QitziMsnFC1cGi/23/+BoIgpEiTdS6LnZ79f5uAriKt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-CET case this is handled in two ways. With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks. For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack. For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it. Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB. For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork(). Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent. During this operation, the shadow stack pointer of the new thread needs to be updated to point to the newly allocated shadow stack. Since the ability to do this is confined to the FPU subsystem, change fpu_clone() to take the new shadow stack pointer, and update it internally inside the FPU subsystem. This part was suggested by Thomas Gleixner. Suggested-by: Thomas Gleixner Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) - Move ifdef inside update_fpu_shstk() (Boris) - Change shstk_alloc_thread_stack() return value to simplify caller (Boris) - Remove extra info about glibc's vfork() implementation from log (Szabolcs Nagy) v3: - Fix update_fpu_shstk() stub (Mike Rapoport) - Fix chunks around alloc_shstk() in wrong patch (Kees) - Fix stack_size/flags swap (Kees) - Use centralized stack size logic (Kees) v2: - Have fpu_clone() take new shadow stack pointer and update SSP in xsave buffer for new task. (tglx) v1: - Expand commit log. - Add more comments. - Switch to xsave helpers. --- arch/x86/include/asm/fpu/sched.h | 3 ++- arch/x86/include/asm/mmu_context.h | 2 ++ arch/x86/include/asm/shstk.h | 5 ++++ arch/x86/kernel/fpu/core.c | 36 +++++++++++++++++++++++++- arch/x86/kernel/process.c | 21 ++++++++++++++- arch/x86/kernel/shstk.c | 41 ++++++++++++++++++++++++++++-- 6 files changed, 103 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h index c2d6cd78ed0c..3c2903bbb456 100644 --- a/arch/x86/include/asm/fpu/sched.h +++ b/arch/x86/include/asm/fpu/sched.h @@ -11,7 +11,8 @@ extern void save_fpregs_to_fpstate(struct fpu *fpu); extern void fpu__drop(struct fpu *fpu); -extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal); +extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long shstk_addr); extern void fpu_flush_thread(void); /* diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index e01aa74a6de7..9714f08d941b 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -147,6 +147,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index 2b1f7c9b9995..d4a5c7b10cb5 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -15,11 +15,16 @@ struct thread_shstk { long shstk_prctl(struct task_struct *task, int option, unsigned long features); void reset_thread_features(void); +unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, + unsigned long stack_size); void shstk_free(struct task_struct *p); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } static inline void reset_thread_features(void) {} +static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size) { return 0; } static inline void shstk_free(struct task_struct *p) {} #endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index f851558b673f..aa4856b236b8 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -552,8 +552,36 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) } } +/* A passed ssp of zero will not cause any update */ +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) +{ +#ifdef CONFIG_X86_USER_SHADOW_STACK + struct cet_user_state *xstate; + + /* If ssp update is not needed. */ + if (!ssp) + return 0; + + xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave, + XFEATURE_CET_USER); + + /* + * If there is a non-zero ssp, then 'dst' must be configured with a shadow + * stack and the fpu state should be up to date since it was just copied + * from the parent in fpu_clone(). So there must be a valid non-init CET + * state location in the buffer. + */ + if (WARN_ON_ONCE(!xstate)) + return 1; + + xstate->user_ssp = (u64)ssp; +#endif + return 0; +} + /* Clone current's FPU state on fork */ -int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) +int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long ssp) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; @@ -613,6 +641,12 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) if (use_xsave()) dst_fpu->fpstate->regs.xsave.header.xfeatures &= ~XFEATURE_MASK_PASID; + /* + * Update shadow stack pointer, in case it changed during clone. + */ + if (update_fpu_shstk(dst, ssp)) + return 1; + trace_x86_fpu_copy_src(src_fpu); trace_x86_fpu_copy_dst(dst_fpu); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index b650cde3f64d..8bf13cff0141 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -48,6 +48,7 @@ #include #include #include +#include #include "process.h" @@ -119,6 +120,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + shstk_free(tsk); fpu__drop(fpu); } @@ -140,6 +142,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) struct inactive_task_frame *frame; struct fork_frame *fork_frame; struct pt_regs *childregs; + unsigned long new_ssp; int ret = 0; childregs = task_pt_regs(p); @@ -174,7 +177,16 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) frame->flags = X86_EFLAGS_FIXED; #endif - fpu_clone(p, clone_flags, args->fn); + /* + * Allocate a new shadow stack for thread if needed. If shadow stack, + * is disabled, new_ssp will remain 0, and fpu_clone() will know not to + * update it. + */ + new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size); + if (IS_ERR_VALUE(new_ssp)) + return PTR_ERR((void *)new_ssp); + + fpu_clone(p, clone_flags, args->fn, new_ssp); /* Kernel thread ? */ if (unlikely(p->flags & PF_KTHREAD)) { @@ -220,6 +232,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); + /* + * If copy_thread() if failing, don't leak the shadow stack possibly + * allocated in shstk_alloc_thread_stack() above. + */ + if (ret) + shstk_free(p); + return ret; } diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 3cb85224d856..bd9cdc3a7338 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -47,7 +47,7 @@ static unsigned long alloc_shstk(unsigned long size) unsigned long addr, unused; mmap_write_lock(mm); - addr = do_mmap(NULL, addr, size, PROT_READ, flags, + addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); @@ -126,6 +126,37 @@ void reset_thread_features(void) current->thread.features_locked = 0; } +unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, + unsigned long stack_size) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + unsigned long addr, size; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + /* + * For CLONE_VM, except vfork, the child needs a separate shadow + * stack. + */ + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + size = adjust_shstk_size(stack_size); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return addr; + + shstk->base = addr; + shstk->size = size; + + return addr + size; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; @@ -134,7 +165,13 @@ void shstk_free(struct task_struct *tsk) !features_enabled(ARCH_SHSTK_SHSTK)) return; - if (!tsk->mm) + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. + */ + if (!tsk->mm || tsk->mm != current->mm) return; unmap_shadow_stack(shstk->base, shstk->size); From patchwork Sun Mar 19 00:15:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180159 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68F66C7618A for ; Sun, 19 Mar 2023 00:16:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C0F6728001D; Sat, 18 Mar 2023 20:16:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B20CC280001; Sat, 18 Mar 2023 20:16:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FC2728001D; Sat, 18 Mar 2023 20:16:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7381C280001 for ; Sat, 18 Mar 2023 20:16:49 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4CE871A0C22 for ; Sun, 19 Mar 2023 00:16:49 +0000 (UTC) X-FDA: 80583732138.06.C3085DB Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf21.hostedemail.com (Postfix) with ESMTP id 49F0E1C0012 for ; Sun, 19 Mar 2023 00:16:47 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QagjewQz; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185007; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=UTD47oSwK8HDyM7Tg5IVAtcUOtNetxtvbzv1n8npU2M=; b=avbWd/m19VZ5Ub3WswNkTy5C3fL+YKZhcNH/Tn7KX/dYAJU9W34TRlAcGuFf66yRbiKqKx wNRXMfLsqE5698SDe5Sv6inLY3iQ1blTBVF94Cs06B1dr9tf/WHybvQG3Na+4fz1oFfPkL V87YERjB5l0JIAWosX9r+wPKalDoFgw= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=QagjewQz; spf=pass (imf21.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185007; a=rsa-sha256; cv=none; b=SqMGgB/IIkHrK8345qzr9nTuAFvX7m+TFfzOk9XPdjcssm0QVW+CeWoCTf2f2pKJ8Wyj5y YbWByO/xiyI5rK/FEvv03U7iF9dLVfoi66HVVo0stIyEJueEXiODCQq6TKglxh3Cucj7iC eznDEnTLfrsbLtcl94dcRMTfpeQyNcg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185007; x=1710721007; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=dBIkMgtPKqQv5kVHDuakiGU5eUx6LQnshjLSaxY9/+4=; b=QagjewQzoIHjvWKRGwgunWgwL48K5j1MHTVlpXTpxInLK8UkefvrE8tw cTfqTMG3LuEUzsiXcp83FfokKDORF1Qm1HiEQtnIM4XmCae5OB+rOj6hj hl7If89wc2FyiVulViNhqM0J2BRAerm61dFSO56Q1+8hmlH3A2uFPZVHj gwRvbIwmrN1Gm0eeWDmM2te0xE6GuM6otq+hQ2zptXF9zy5vdp/pko4dW QQ527XkKr1JpnNweaMindW+hlCvm+wkgUpYyZtqSJhtU8bIgxSzVbvL5u PCZ1Jb/F++8eqBe5YbWPr7VhmS2eOj6UjuEWwOqErNCm9uSCGZJj5WBtn Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491432" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491432" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672945" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672945" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:44 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 31/40] x86/shstk: Introduce routines modifying shstk Date: Sat, 18 Mar 2023 17:15:26 -0700 Message-Id: <20230319001535.23210-32-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 49F0E1C0012 X-Rspam-User: X-Stat-Signature: rry1347rte3t39m7ujcjh796q5dptsyn X-HE-Tag: 1679185007-210644 X-HE-Meta: U2FsdGVkX18Yu2HL2JzVkYIxoU6TOdxcnJzVp6HgvGqVLUZrhyBIXb/stOJb582Ka5GDkyoc2JfTL0iIGLBTp3lZY3nsQwDcfRfjl4dIlBDppmKiwJtfuXGaGEl8kI+MEI4VmIWMR83ol76qu4wpm0Q3aoRaF56qvB8JX8AUFUrtbXHNfcIE1gvQXfcQ/4wiqNZlhXpbNu+YaOy0UFJ9A52Xs5Aq5tGMwFt3afpZKrIvhbmj3fndJp1C3JF43srqoMFdkxNE5c3T8n9S70PnZDWNhBtV5jYDqKVBZ/YsnBubgR00MgihHINO4dVYXCEw7ctt7WkINdN/jD9w0Um9zHlDUYfAeBFdzftWzxxrE7A5I58rZxXxjxoUU16w5imIRKDzSXpU21sQfLD5egiopI82Y8ilKo26OAFOVBXqjjEwst7yzz6yJLbBN9vYqrhcLPeCKyuDBBj6qN2TO7PG2qtRhVQoANQHr7ZRAndHAjLKLL4MVd/omLOrtv1uawdFwwx8ynHvetrNgCyqFC66g7lGeh2xUr9HI142BBxjrckuHVYHj0rjXTHKikF6fhx0EEzY9zORH8lOAZ5r9G0IAZ/0qmtlkF8oyJ5xFa1l2d8TXkIvaZzSHW1pp7Gs8pgjnZP2L5qeHcBv6kTo91/Ohl1OGymBpx1F6zC9zs8Q/ksB6RKjIkUPmx3MN+NnKxx3D/KDHYezlaKDCpCs0ItQc6slSdpIDLSNePYGg6rKjhadZT4Q5zDbA24hEVbVSV/GF8HwOGvpUATY864/uH7t59jP2nX2NfKj3CVPt4PRoBqXmmOHHPTl8HKOE14RMfq3WKUa+PfisFJBlOffx0hPxU0blo97b3lOx7qyiNWmfC2D8D89E9ZrhYI4WgsE3CK8a6gdVq9XKgQmUa9oPRznS4NEnrYJRtGi645C4fKFmmrl0hGOZY2nyIslCavCuXw9pAqLzHHLEBRfOK1w6h/ uaQfLzrX 6pk/qxE5vNCTS34aJy3ECazpzEF1xhAD9gdEkj2NuKPQmLPHJPwYzbnIhbh9oZNKr6kyVnsxrYK+kT3rqqTM8nSXwulJdZ2mDmiOsHeXe/zvsq77JNVef4KICzmVmfCxfYG85NmfnwTNmePxGuZrEo+J3JEbNoTNZmY7Cnpv+eZRDWoC64KVgqNzm8cYln/jA29IO5mkufI57ZpcEy4BgPlVpX8jZcTq8GSELmM64sBzDWmYULrrddM3ImrVMRNJ28UnMj6SieYDyKB8vcjTQuGfuy5IAZP7XD39fe++q3mZtw1Z92kyXbhGoLoh+NNnXOH6rV3+Mn0o8Xs5a8pfjdoDdV95m4DUU/Ejh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stacks are normally written to via CALL/RET or specific CET instructions like RSTORSSP/SAVEPREVSSP. However, sometimes the kernel will need to write to the shadow stack directly using the ring-0 only WRUSS instruction. A shadow stack restore token marks a restore point of the shadow stack, and the address in a token must point directly above the token, which is within the same shadow stack. This is distinctively different from other pointers on the shadow stack, since those pointers point to executable code area. Introduce token setup and verify routines. Also introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. In future patches that enable shadow stack to work with signals, the kernel will need something to denote the point in the stack where sigreturn may be called. This will prevent attackers calling sigreturn at arbitrary places in the stack, in order to help prevent SROP attacks. To do this, something that can only be written by the kernel needs to be placed on the shadow stack. This can be accomplished by setting bit 63 in the frame written to the shadow stack. Userspace return addresses can't have this bit set as it is in the kernel range. It also can't be a valid restore token. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) - Use define instead of magic BIT(63) (Boris) v5: - Fix typo in commit log v3: - Drop shstk_check_rstor_token() - Fail put_shstk_data() if bit 63 is set in the data (Kees) - Add comment in create_rstor_token() (Kees) - Pull in create_rstor_token() changes from future patch (Kees) v2: - Add data helpers for writing to shadow stack. --- arch/x86/include/asm/special_insns.h | 13 +++++ arch/x86/kernel/shstk.c | 75 ++++++++++++++++++++++++++++ 2 files changed, 88 insertions(+) diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index de48d1389936..d6cd9344f6c7 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -202,6 +202,19 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_USER_SHADOW_STACK +static inline int write_user_shstk_64(u64 __user *addr, u64 val) +{ + asm_volatile_goto("1: wrussq %[val], (%[addr])\n" + _ASM_EXTABLE(1b, %l[fail]) + :: [addr] "r" (addr), [val] "r" (val) + :: fail); + return 0; +fail: + return -EFAULT; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + #define nop() asm volatile ("nop") static inline void serialize(void) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index bd9cdc3a7338..e22928c63ffc 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -25,6 +25,8 @@ #include #include +#define SS_FRAME_SIZE 8 + static bool features_enabled(unsigned long features) { return current->thread.features & features; @@ -40,6 +42,35 @@ static void features_clr(unsigned long features) current->thread.features &= ~features; } +/* + * Create a restore token on the shadow stack. A token is always 8-byte + * and aligned to 8. + */ +static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) +{ + unsigned long addr; + + /* Token must be aligned */ + if (!IS_ALIGNED(ssp, 8)) + return -EINVAL; + + addr = ssp - SS_FRAME_SIZE; + + /* + * SSP is aligned, so reserved bits and mode bit are a zero, just mark + * the token 64-bit. + */ + ssp |= BIT(0); + + if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) + return -EFAULT; + + if (token_addr) + *token_addr = addr; + + return 0; +} + static unsigned long alloc_shstk(unsigned long size) { int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; @@ -157,6 +188,50 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long cl return addr + size; } +static unsigned long get_user_shstk_addr(void) +{ + unsigned long long ssp; + + fpregs_lock_and_load(); + + rdmsrl(MSR_IA32_PL3_SSP, ssp); + + fpregs_unlock(); + + return ssp; +} + +#define SHSTK_DATA_BIT BIT(63) + +static int put_shstk_data(u64 __user *addr, u64 data) +{ + if (WARN_ON_ONCE(data & SHSTK_DATA_BIT)) + return -EINVAL; + + /* + * Mark the high bit so that the sigframe can't be processed as a + * return address. + */ + if (write_user_shstk_64(addr, data | SHSTK_DATA_BIT)) + return -EFAULT; + return 0; +} + +static int get_shstk_data(unsigned long *data, unsigned long __user *addr) +{ + unsigned long ldata; + + if (unlikely(get_user(ldata, addr))) + return -EFAULT; + + if (!(ldata & SHSTK_DATA_BIT)) + return -EINVAL; + + *data = ldata & ~SHSTK_DATA_BIT; + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; From patchwork Sun Mar 19 00:15:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180160 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 241ADC74A5B for ; Sun, 19 Mar 2023 00:16:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A375928001E; Sat, 18 Mar 2023 20:16:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 94B99280001; Sat, 18 Mar 2023 20:16:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7750F28001E; Sat, 18 Mar 2023 20:16:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 53E49280001 for ; Sat, 18 Mar 2023 20:16:51 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3570DA0BFA for ; Sun, 19 Mar 2023 00:16:51 +0000 (UTC) X-FDA: 80583732222.21.57647D8 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf11.hostedemail.com (Postfix) with ESMTP id 2164140007 for ; Sun, 19 Mar 2023 00:16:48 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="A/CALrw0"; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185009; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=DMqkzk73IxHG1I4Q11s/ouCJSTSEx1hUtj8KnGYTYws=; b=l/aclEeg+DZPTgCP5E7XM2PLfm8QNAWXfFhriMs4AnQnGotzE7zdlSHrlbACPv+O9jTziD 9HfUAZiTBwtxOawbWFyjHZc6IjOILVB4omPgzB1oznlIX6hsP3fmS4J/XnH1dnXh9R2e0M oY9ylcZCsZOdhZUoxdbUtZhuPkwgL4w= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="A/CALrw0"; spf=pass (imf11.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185009; a=rsa-sha256; cv=none; b=t8Oo0x7/47758zlFSD13r4goo/9IA7a3kV4HpOIxfVxTd2Nuf/a2IUxBlwsJJur+7QrI+R snP15YxKEnxEjtMtCNa6Onjv4ZqtWWnrW8l5ObZsykD7LA9iNL52lRF053IlNmUfXtpe5O lqcQNygKTBwdf46BvzNZ1n4JerPqFGs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185009; x=1710721009; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=44qPir7RLXKA27y57yzk3u7KQrjWXmALrqhQJT9eSm4=; b=A/CALrw0pNbAk1xXqLxc6m/QcDPRjqpBhxNGm0XCxVQOjPNZ7XpEq3EP gFgp0NJYHXgNSjyCaw385981I/EapVB7Ao0btYx2ZfSt12bj1oGBvQ07s zRSqF9eo4HDu/31Wdh2cGtNF6ZB8a3GPlpqYd9bGT4Un9tgQSHMqzWGKf +HykXtVllIqY4QPKEMVaBkQI0DUkQfidYDDs+Mg7Y7Ck2GewS9XLBuS0n G6Jp+iNH/9Xx0qY5M7gym9y7ZMBfcCJqKM0IEyDAmDqdp9VvT+4/ha0xA AeieL42kyxOF3IJTfC12MOrmCd3ql9sFJzSC8LzZqM87OcDoNdDhJYURT g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491455" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491455" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672951" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672951" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:46 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 32/40] x86/shstk: Handle signals for shadow stack Date: Sat, 18 Mar 2023 17:15:27 -0700 Message-Id: <20230319001535.23210-33-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2164140007 X-Rspam-User: X-Stat-Signature: zhn6iyj6edcqth89zdx3jnhbtx89ck33 X-HE-Tag: 1679185008-807195 X-HE-Meta: U2FsdGVkX19pVvGfguQlEXouqvCD0mek00NYOay2QPs1lvYjAIXOVDiB7NB8WwODhYdn4fDMz+mJ8lwyQ5j1zMbpcW6jxLB2yWtLrST//yFMe9N1ppZTcN4dV1O3KZ1pKocub5fYQ0KKyvdVK9Sa81V/t7P7Fa7mq8DML94aKtNqmVaOX6WJTDedTe3nSZfJ5j85WQz94/3XW50bU6nFY/UHxGZSRiY/ev5M9+YM+BlrU0EhtpkTwBZO47vDCDXr5mJNvmJZuyLG6qZMFENqSYQ3F/qheRB+Y9pBo28tQLcstT0Y5TJgSOi4HLPJ4kYHKm7G8CD30+5eVqWRY0aHt6h549srrr9KONJx931KpRtB5W/XLxcocPEec7po0RUbnEjWENQ9NeBVjrVDVDz0Uj2ae/z64kcwELZ2RqQp1FetAFMlSx8lY9QHJ60giwcmwIZ+OadRmrecduq6nxSDuTNCnqICIztK9MkzZ5p3s37trwVtCNhsbQY4EfXF1wxKFyu7rWXY4qSqsfJTBQDgT4XaUi5pwXOj0GEFYRcojr/XwOuCjmFhRUsfBtD5lV+TT9Qu16ylhW+8JhZrsW2ulEe+DdUXr+0FGSP3F3lG9sE+UgxCj1+vobqM5BHCR9mhZNvXJdKGgRPVDXegyPUwIf/vNBTz6MPhEIvZTymrTMmD3NchYdnxIjlvjxi9SSDRHzs3UwJSs0Jx3TbSZ4bMDY0l4pQYcbIXC4h1oCFt4Ri8aAXHcjOXOocCzTlLmSFzk50ONTBcOKVs29D3w/JSnPSApca2hFOMoEhRmdLZN7Sl8FEbZJczRMItKez7wpvf+xYQan4hGYiVnIIbD607HEPbntQgZBCOLuWPowipEEyYKywcCmImHIXKXwyETVBhfkkOMY83ARLxClDjRJ8sKA7SdD+LH3bzJvr86X3xzFFE0wW6iEBikjXDFRTPC1X/FPNyDYS4klbvL3bJQtI BxjSxA1m qx3vJfqARWVyj+tZDcAGW/WMRpUg7m5/vMdUK6oRYA3cnHdlHTQOIqdt+QNeZ+YN+/byCYlrsDVkpZYQyLeVrrBc7MEDuOuSmXXYQX9ZTpGIB7HTRlycuquXTrfE4AsZKyucgfpwRX7bdtiFA7fuohqZu/w6R+G1pR2a3k/puuXeb8PfXp9CnbOJlU26D1aevYMx4kvZk5+OOXA3u8KcCA4SacvT7KRDM4OoKOVwBnn4ZD5LJX1ViqhBPFitas0X593SweOxSR5fVDJAF35JfUcCVFOYwPaq4/Uybf4VLoIH06pTngw7bspsIDDlNXGq5g9Dnoc9p4Is2XDMunMuWD/cZb+STATcaWleY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a signal is handled, the context is pushed to the stack before handling it. For shadow stacks, since the shadow stack only tracks return addresses, there isn't any state that needs to be pushed. However, there are still a few things that need to be done. These things are visible to userspace and which will be kernel ABI for shadow stacks. One is to make sure the restorer address is written to shadow stack, since the signal handler (if not changing ucontext) returns to the restorer, and the restorer calls sigreturn. So add the restorer on the shadow stack before handling the signal, so there is not a conflict when the signal handler returns to the restorer. The other thing to do is to place some type of checkable token on the thread's shadow stack before handling the signal and check it during sigreturn. This is an extra layer of protection to hamper attackers calling sigreturn manually as in SROP-like attacks. For this token the shadow stack data format defined earlier can be used. Have the data pushed be the previous SSP. In the future the sigreturn might want to return back to a different stack. Storing the SSP (instead of a restore offset or something) allows for future functionality that may want to restore to a different stack. So, when handling a signal push - the SSP pointing in the shadow stack data format - the restorer address below the restore token. In sigreturn, verify SSP is stored in the data format and pop the shadow stack. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) - Remove duplicate alignment check (Boris) v3: - Drop shstk_setup_rstor_token() (Kees) - Drop x32 signal support, since x32 support is dropped v2: - Switch to new shstk signal format v1: - Use xsave helpers. - Expand commit log. --- arch/x86/include/asm/shstk.h | 5 ++ arch/x86/kernel/shstk.c | 95 ++++++++++++++++++++++++++++++++++++ arch/x86/kernel/signal.c | 1 + arch/x86/kernel/signal_64.c | 6 +++ 4 files changed, 107 insertions(+) diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index d4a5c7b10cb5..ecb23a8ca47d 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -6,6 +6,7 @@ #include struct task_struct; +struct ksignal; #ifdef CONFIG_X86_USER_SHADOW_STACK struct thread_shstk { @@ -18,6 +19,8 @@ void reset_thread_features(void); unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size); void shstk_free(struct task_struct *p); +int setup_signal_shadow_stack(struct ksignal *ksig); +int restore_signal_shadow_stack(void); #else static inline long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { return -EINVAL; } @@ -26,6 +29,8 @@ static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size) { return 0; } static inline void shstk_free(struct task_struct *p) {} +static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; } +static inline int restore_signal_shadow_stack(void) { return 0; } #endif /* CONFIG_X86_USER_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index e22928c63ffc..f02e8ea4f1b5 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -232,6 +232,101 @@ static int get_shstk_data(unsigned long *data, unsigned long __user *addr) return 0; } +static int shstk_push_sigframe(unsigned long *ssp) +{ + unsigned long target_ssp = *ssp; + + /* Token must be aligned */ + if (!IS_ALIGNED(target_ssp, 8)) + return -EINVAL; + + *ssp -= SS_FRAME_SIZE; + if (put_shstk_data((void *__user)*ssp, target_ssp)) + return -EFAULT; + + return 0; +} + +static int shstk_pop_sigframe(unsigned long *ssp) +{ + unsigned long token_addr; + int err; + + err = get_shstk_data(&token_addr, (unsigned long __user *)*ssp); + if (unlikely(err)) + return err; + + /* Restore SSP aligned? */ + if (unlikely(!IS_ALIGNED(token_addr, 8))) + return -EINVAL; + + /* SSP in userspace? */ + if (unlikely(token_addr >= TASK_SIZE_MAX)) + return -EINVAL; + + *ssp = token_addr; + + return 0; +} + +int setup_signal_shadow_stack(struct ksignal *ksig) +{ + void __user *restorer = ksig->ka.sa.sa_restorer; + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + if (!restorer) + return -EINVAL; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_push_sigframe(&ssp); + if (unlikely(err)) + return err; + + /* Push restorer address */ + ssp -= SS_FRAME_SIZE; + err = write_user_shstk_64((u64 __user *)ssp, (u64)restorer); + if (unlikely(err)) + return -EFAULT; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + +int restore_signal_shadow_stack(void) +{ + unsigned long ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !features_enabled(ARCH_SHSTK_SHSTK)) + return 0; + + ssp = get_user_shstk_addr(); + if (unlikely(!ssp)) + return -EINVAL; + + err = shstk_pop_sigframe(&ssp); + if (unlikely(err)) + return err; + + fpregs_lock_and_load(); + wrmsrl(MSR_IA32_PL3_SSP, ssp); + fpregs_unlock(); + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 004cb30b7419..356253e85ce9 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -40,6 +40,7 @@ #include #include #include +#include static inline int is_ia32_compat_frame(struct ksignal *ksig) { diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c index 0e808c72bf7e..cacf2ede6217 100644 --- a/arch/x86/kernel/signal_64.c +++ b/arch/x86/kernel/signal_64.c @@ -175,6 +175,9 @@ int x64_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs) frame = get_sigframe(ksig, regs, sizeof(struct rt_sigframe), &fp); uc_flags = frame_uc_flags(regs); + if (setup_signal_shadow_stack(ksig)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -260,6 +263,9 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (restore_altstack(&frame->uc.uc_stack)) goto badframe; From patchwork Sun Mar 19 00:15:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180162 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BD94C77B62 for ; Sun, 19 Mar 2023 00:16:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46C1E280021; Sat, 18 Mar 2023 20:16:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F2FA280001; Sat, 18 Mar 2023 20:16:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E81B280022; Sat, 18 Mar 2023 20:16:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E5B76280001 for ; Sat, 18 Mar 2023 20:16:54 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B7964C0CF5 for ; Sun, 19 Mar 2023 00:16:54 +0000 (UTC) X-FDA: 80583732348.20.ADBAB9F Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf30.hostedemail.com (Postfix) with ESMTP id B461C80014 for ; Sun, 19 Mar 2023 00:16:52 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lQdokpHL; spf=pass (imf30.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185013; a=rsa-sha256; cv=none; b=CxZFh/8gJMbWi6OLDqSqVPIQ/SMlVEaQ5n3haKoEgUSmatKzO3zBAqOUbrhriYb1CTJITK xd5AsRQ/kugrgSBBtVFzPMwaPMFt1MkcdbJrMhoH0wTm9N6mx1AVjLEu4/Ud37cUDqGD6P OrcH7syK2EY4jBieBwtQWSpyC9olZiA= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lQdokpHL; spf=pass (imf30.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185013; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=f2N7NeFbDpnW0PteYmflUApJP8SLzh237LLNBHrJXWg=; b=maaXdzTr6jtuOhl+3Z7wjxkiUwI6QGFN1zIg2kZLBVAHs8rS7NRVC5WmqHYju0DtK9Lxcw PZcM2V7jvdMRaoA+hTFss3TBSDlgNnXpy+JI8F4xx6y46vuoFsjcXnd3qKR4rXkj/W7eYa XmavkHzfsZxqTIhpmM0RsEEHXNmO3so= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185012; x=1710721012; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=yPUIYMNlxQJ7Nq6500igNTRtdiMTwrmDSrCHmUaqViI=; b=lQdokpHLbDQF8QZp1+v4J+/OY+b0DSmR3jCRtA37JxgFHiMgTPQlsamY zBN5mlf9yR0GAsEfQ1KpvdXfu9wyQ0+towsfmiyxFDGcta/abl60dehyS 6jXuvR/mW3peC9nuwm9/YJPCFlLebeO9LzyTykikuglLEnRygNWkUAmiH mJIV0tiJa9aHEI6DdhQC39fEuQbPBbpAgfjSakhQ0vcsZfg3+phH6pKoD 1ZvLPfLTuLzW5ekBTh3/UnmP+IaH3g5L/eckiX7uMitxFfR4guSVe2FxS XMUAqkVhaMBvPNA9GoIxEWBY9LrLPQjEBGi9DTIs956/6hw1YGhNq4CBN Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491478" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491478" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672961" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672961" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:48 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 33/40] x86/shstk: Introduce map_shadow_stack syscall Date: Sat, 18 Mar 2023 17:15:28 -0700 Message-Id: <20230319001535.23210-34-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspam-User: X-Rspamd-Queue-Id: B461C80014 X-Rspamd-Server: rspam01 X-Stat-Signature: txbedyzps61mde4x1zpjddwd9gz6e8pz X-HE-Tag: 1679185012-529417 X-HE-Meta: U2FsdGVkX19VYi70E8PH5V1DpbhXM2SOqwdaQjbPsEZMZ77jNyhRckLzq/z80A3Kjl8oCO4xKtpBLyGNXM8YQ3guPo09GByQoNcTXoJW33geuCcdytpCRKIepti3OYamf0d28dL79U8hbHBzRiVnHBdne6puDWWUTPhjtiFjZY3ym5nk8IvBZOdgaQHYCTg3c0bdGQ1Nrsi6Om7oLoDf9hJjOQeEGP5vujStnbYSz+Br6U+Ynmd58HbCeDojckAQz9nvbgJG1z+K85jOcC4P+SIc8Cq3VirRfm1UPplrEnmtW5PBen0BFlCF/YG8g6R8e03J6FV13fjRbjlhXqAdG44X5LE++bNKfoQX0nMGEAzAFHEhg+xGLXmWEgTzmzOYER+bMD7AVaIEIMn3jDp67z1ghqMYtAUKcdwSo7YGdxqV5ZXBjcPvqgXhPb9v1Qx2qxfOlKKGX6GFm4clQe8Lk5QKCqzZG4qAh1+60Hn8/dt7pQWadMK4qJUiNSr3+onGrBE0b6kIb4Evm19BKkh1M5KPCzdny3sy/5ivS3SvKj3Y1mvawmOf48GdxwsLUQWv5qg2TyfhE+NX/nHLx0EjZnowXOrTTevNFRt/xDGTNNr2n6rUiC8IbDX6kEgm5g4+alb2YRVeIyL3ErNrtWn4V7xA19TK93mfvJBXhGArujHhAPj+mpKnx2fPac8dYtUomHrh9Nhb528FOYg83oSRtQmqMR4jmQMTNpRVgw5NxCxff4hpskElv5w2veSIYhGvjHo3vwdQ3qHXU94StJZFwPIuyZElwntv+u4UUjxrqXu8ijd6ZhA+ebooEJQiYNPXsEeg+Y+1qgrMpFq4fMG8u9Ti37OUesUXKrE8FgZuioBTIh2FxVJRSckG2qa288pZnaoA1e/GPbXVNnPgoNpItcFqGDccht3RPVIimPYj88iWD8rehimQXj3RD2XyQflko1RFP9moEsH09mz5NmE 61+wsmfd /b+oqJpg/HhHbhxZtmIXlB8usDN90/LQp4gLyNzQbM/zTLhzahqbiPJBlZ4nX2LS80g6BGmxg+8t+YKGJ90e/AURhwzxaRruEP8rD4bqf2MC9iBMLHRk3JPJFbAVPjTjRweXLvK9OMPW0KIW29YhGNsQjUp2+k1aTrV6WY7ArzmfapLl78nuMVlpfBgKJs7kWjAGjqK6gwqfNCKEJBYChBr+K3OkTQEQX66qJAFR82nIGWj47Ja6dUcExM+E92VmCCq8DmLuXxf3Z7ftnbouX0irjocdYR4B3MGGOJb6ME+N+qsLd9iwFZzVzgjeMUJJkizuO+Mse1TZGTD+VAHYRjheFPCpJ/Ud3ptbavo+MWBhp3kQurBIxdFwIF1cFkDHflx8MLbMW1Nkpc+N2apZjfKACZQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When operating with shadow stacks enabled, the kernel will automatically allocate shadow stacks for new threads, however in some cases userspace will need additional shadow stacks. The main example of this is the ucontext family of functions, which require userspace allocating and pivoting to userspace managed stacks. Unlike most other user memory permissions, shadow stacks need to be provisioned with special data in order to be useful. They need to be setup with a restore token so that userspace can pivot to them via the RSTORSSP instruction. But, the security design of shadow stacks is that they should not be written to except in limited circumstances. This presents a problem for userspace, as to how userspace can provision this special data, without allowing for the shadow stack to be generally writable. Previously, a new PROT_SHADOW_STACK was attempted, which could be mprotect()ed from RW permissions after the data was provisioned. This was found to not be secure enough, as other threads could write to the shadow stack during the writable window. The kernel can use a special instruction, WRUSS, to write directly to userspace shadow stacks. So the solution can be that memory can be mapped as shadow stack permissions from the beginning (never generally writable in userspace), and the kernel itself can write the restore token. First, a new madvise() flag was explored, which could operate on the PROT_SHADOW_STACK memory. This had a couple of downsides: 1. Extra checks were needed in mprotect() to prevent writable memory from ever becoming PROT_SHADOW_STACK. 2. Extra checks/vma state were needed in the new madvise() to prevent restore tokens being written into the middle of pre-used shadow stacks. It is ideal to prevent restore tokens being added at arbitrary locations, so the check was to make sure the shadow stack had never been written to. 3. It stood out from the rest of the madvise flags, as more of direct action than a hint at future desired behavior. So rather than repurpose two existing syscalls (mmap, madvise) that don't quite fit, just implement a new map_shadow_stack syscall to allow userspace to map and setup new shadow stacks in one step. While ucontext is the primary motivator, userspace may have other unforeseen reasons to setup its own shadow stacks using the WRSS instruction. Towards this provide a flag so that stacks can be optionally setup securely for the common case of ucontext without enabling WRSS. Or potentially have the kernel set up the shadow stack in some new way. The following example demonstrates how to create a new shadow stack with map_shadow_stack: void *shstk = map_shadow_stack(addr, stack_size, SHADOW_STACK_SET_TOKEN); Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) - Use SZ_4G (Boris) - Return different error codes for each reason (Boris) v5: - Fix addr/mapped_addr (Kees) - Switch to EOPNOTSUPP (Kees suggested ENOTSUPP, but checkpatch suggests this) - Return error for addresses below 4G v3: - Change syscall common -> 64 (Kees) - Use bit shift notation instead of 0x1 for uapi header (Kees) - Call do_mmap() with MAP_FIXED_NOREPLACE (Kees) - Block unsupported flags (Kees) - Require size >= 8 to set token (Kees) v2: - Change syscall to take address like mmap() for CRIU's usage --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/uapi/asm/mman.h | 3 ++ arch/x86/kernel/shstk.c | 59 ++++++++++++++++++++++---- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 2 +- kernel/sys_ni.c | 1 + 6 files changed, 58 insertions(+), 9 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c84d12608cd2..f65c671ce3b1 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -372,6 +372,7 @@ 448 common process_mrelease sys_process_mrelease 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node +451 64 map_shadow_stack sys_map_shadow_stack # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 5a0256e73f1e..8148bdddbd2c 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -13,6 +13,9 @@ ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) #endif +/* Flags for map_shadow_stack(2) */ +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */ + #include #endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index f02e8ea4f1b5..6d2531ce661c 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -71,19 +72,31 @@ static int create_rstor_token(unsigned long ssp, unsigned long *token_addr) return 0; } -static unsigned long alloc_shstk(unsigned long size) +static unsigned long alloc_shstk(unsigned long addr, unsigned long size, + unsigned long token_offset, bool set_res_tok) { int flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_ABOVE4G; struct mm_struct *mm = current->mm; - unsigned long addr, unused; + unsigned long mapped_addr, unused; - mmap_write_lock(mm); - addr = do_mmap(NULL, 0, size, PROT_READ, flags, - VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); + if (addr) + flags |= MAP_FIXED_NOREPLACE; + mmap_write_lock(mm); + mapped_addr = do_mmap(NULL, addr, size, PROT_READ, flags, + VM_SHADOW_STACK | VM_WRITE, 0, &unused, NULL); mmap_write_unlock(mm); - return addr; + if (!set_res_tok || IS_ERR_VALUE(mapped_addr)) + goto out; + + if (create_rstor_token(mapped_addr + token_offset, NULL)) { + vm_munmap(mapped_addr, size); + return -EINVAL; + } + +out: + return mapped_addr; } static unsigned long adjust_shstk_size(unsigned long size) @@ -134,7 +147,7 @@ static int shstk_setup(void) return -EOPNOTSUPP; size = adjust_shstk_size(0); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return PTR_ERR((void *)addr); @@ -178,7 +191,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long cl return 0; size = adjust_shstk_size(stack_size); - addr = alloc_shstk(size); + addr = alloc_shstk(0, size, 0, false); if (IS_ERR_VALUE(addr)) return addr; @@ -368,6 +381,36 @@ static int shstk_disable(void) return 0; } +SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags) +{ + bool set_tok = flags & SHADOW_STACK_SET_TOKEN; + unsigned long aligned_size; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + if (flags & ~SHADOW_STACK_SET_TOKEN) + return -EINVAL; + + /* If there isn't space for a token */ + if (set_tok && size < 8) + return -ENOSPC; + + if (addr && addr < SZ_4G) + return -ERANGE; + + /* + * An overflow would result in attempting to write the restore token + * to the wrong location. Not catastrophic, but just return the right + * error code and block it. + */ + aligned_size = PAGE_ALIGN(size); + if (aligned_size < size) + return -EOVERFLOW; + + return alloc_shstk(addr, aligned_size, size, set_tok); +} + long shstk_prctl(struct task_struct *task, int option, unsigned long features) { if (option == ARCH_SHSTK_LOCK) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 33a0ee3bcb2e..392dc11e3556 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1058,6 +1058,7 @@ asmlinkage long sys_memfd_secret(unsigned int flags); asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long len, unsigned long home_node, unsigned long flags); +asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 45fa180cc56a..b12940ec5926 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -887,7 +887,7 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv) __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #undef __NR_syscalls -#define __NR_syscalls 451 +#define __NR_syscalls 452 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 860b2dcf3ac4..cb9aebd34646 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -381,6 +381,7 @@ COND_SYSCALL(vm86old); COND_SYSCALL(modify_ldt); COND_SYSCALL(vm86); COND_SYSCALL(kexec_file_load); +COND_SYSCALL(map_shadow_stack); /* s390 */ COND_SYSCALL(s390_pci_mmio_read); From patchwork Sun Mar 19 00:15:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180161 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A3D4C77B60 for ; Sun, 19 Mar 2023 00:16:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04B48280020; Sat, 18 Mar 2023 20:16:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1AED280021; Sat, 18 Mar 2023 20:16:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D68CB280020; Sat, 18 Mar 2023 20:16:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BF3AC280001 for ; Sat, 18 Mar 2023 20:16:54 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9996C120CA9 for ; Sun, 19 Mar 2023 00:16:54 +0000 (UTC) X-FDA: 80583732348.25.11E08FF Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf29.hostedemail.com (Postfix) with ESMTP id ADBB5120004 for ; Sun, 19 Mar 2023 00:16:52 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Mzmc/Oss"; spf=pass (imf29.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185012; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=PdS9YQ0k1vCI/xLvcPjO4/eFlX4M1vAT9LvhSm6ce6U=; b=Q1tksjg2NKHKzMfs31IQ8AHwWfUWYfDy4x00Xv9+0CyVR0rnQYH0Y8w6wuncNQRuuRP3ut T0ZUkrbk1nbA0T+uVLtARMhhVsHr26xW2H2+Ao5sFMHiNF/9TzfIex1Rc/hcbTvTHdGUjw +Pl0o9PTCKLHGZJYKk0HXhY2KIdglJY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Mzmc/Oss"; spf=pass (imf29.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185012; a=rsa-sha256; cv=none; b=7/Dqt/WMvYK3S74Aae6WBaDlQ6m+WAlyzZ539vlOya7ySJrECnt1yWJx0oRvt6i2oVRB6h zxgH2K1IHqRFA+TTV1iZRSSqSZp2XiceIrFpxFa/KPsGJ763McKScF+bMswgBLewb8w/bw rN0nnHqcLZMi544UyYEasrc3GnrTRrM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185012; x=1710721012; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=hhZNEhMG5Ur6pBP04hNf9KVjJtSiNiJCBm2RUWbtaSg=; b=Mzmc/OsswJr9nJb7Y/EK+eA1P35A7XLgzN5hZIqgHncvWWXvuLaKQ3Wd OyMRnuR2QxkszsnNRdnQEvnsF4I2qyCFHTqQWcopLHk/MQ1NH91LI++YU VE6e9BnKeoo6RkeqvyWgCHr+mSVKCr+GH/qwLIuja0yl0Tmr2EX62r2bC rC01oNq+xGc8ap+bOUok+BZkgjFMoK+nJb9UFNmW6Py2pWEPpdfOpsCmc Tcgq/D3HVya5zB547IQKiyDfx/SH6mfrF6G50xYoUAHJJmTVp7DES4RPy WgFSLE+hWB+LDUnWpPmGZBr8vQEAdjycHWP+wlxbYP+nibRF1+YKQnQSy Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491502" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491502" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672971" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672971" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:50 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 34/40] x86/shstk: Support WRSS for userspace Date: Sat, 18 Mar 2023 17:15:29 -0700 Message-Id: <20230319001535.23210-35-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: ADBB5120004 X-Stat-Signature: zbzpk5xekfc1hhj5ujbzs743sybtuy16 X-Rspam-User: X-HE-Tag: 1679185012-630000 X-HE-Meta: U2FsdGVkX19aomiV37/dUWPlYdxEB2O7Mo6gN+ZDlVvUMNMMRdP9nHNq4YI3schHsZDrHteLkIwS9A0vSo/Ha5VMpTwW+b1xBVntXGUmRLuyKb6OP7pTdJPljFcGYD3RB0sayKr6h9ahNZACSeagGeM/Xnq4CMdhEfHIqKCM7jm3JjE4pea7RWtLlv/DdDeToWzkwG42W5y0LmBUw9pWP1Xl4zLi8PQxHj9Ko1d3QsQ7j88+SC+yVhyLiYpIbj6amGMYppYAG2Hquf3NsM2Kh2S2SXKQNHOb72uSxWDSH5dcj6FEGxEbmhSL73vYGirr50JZYRGm5iSwatuw3RGxtEQdszGQ4sT+/6tRVMbTf6V9niKZ+1Tcx7Zm2XEBK6CaRiS770ZDF65AuIj7sfipxIfeAUzK5BjaZLM91j5Lw2d1TlE5snwuSChRib0RKqCtP8rCbTKVZ4pnguuPQHFCz25XQRoGZEcHSfbOYuVWuvzTj4sMPnbTeU7mO0mMcnHHZIijkQqihdBlvi+eq9ShhEUk+YXuzlpd/rSTK3A3cMPmyvlswT0VDtxEMzYQ9kBp8i82uD8MPwJUvznMJYzwbU7DIJV+ce7EZ7sWknRfg4CE1fgvHybTh3/KkaJejeqWFGQ7pPvWXklRSfsLarPvbHmLr4qc4XO6/akva/SNliJieFKAWvn7WBLhxFLnfLaim1br0IYp10hfczsc1TV6Rb1LsQAPDt3kYXAjgcVQgC5NV6zP5MXtCrbN0R2/siObagLwShuOcH/2aiMwSF7aKQHH9MWzsJpp2HttV/PZhTFlFaA5lNgIdJCDUujzZsw3mPgC6wVnnhvs4mnO5ZjbQAL67AilDXZXcWAJMzdQQHfM0wbogB3E4GkzjrT2y69TWS8e5tVuqw/aZ3Y0leQi4B2M3j7tD6myMi4OdvU5QpYqYYS+XJZ1m0l1MRO+gmpN2a0fRiD+1EwXhQBGDDA zFUcqxzR CCQx/EZlq+fShIlaWMuAym7ocnqUTjZ+jao+x8JiZBxZQJh4F4NrnZ9P9nnw/F14wPuiooIRtUHs+MMzLqIw8p0DKymKoll1iJaiG2MOEaWIAvvREYjG7EIDbigeM9EdxAo85tgqdOamoN9Yg5Fmr4v3FGaxj5GCBY9ndBnJoOXir7cR06t7EPJVbApiyZv5yHbLE+q/eNDUaqFY8evSbPW4528sdt3TU4GUeJHQS1kFubG3u4TdxHgTojRYFDvdUjduEMQWAmSMvoZaLK6PTaO+GO7xDqj477Apr4RsXlmHBsMqaB+xJobYSLI1u5OQ6YYlnhEVN/k0iWmAdO96Yi/LWJQ4ezvgYEp8Q X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For the current shadow stack implementation, shadow stacks contents can't easily be provisioned with arbitrary data. This property helps apps protect themselves better, but also restricts any potential apps that may want to do exotic things at the expense of a little security. The x86 shadow stack feature introduces a new instruction, WRSS, which can be enabled to write directly to shadow stack memory from userspace. Allow it to get enabled via the prctl interface. Only enable the userspace WRSS instruction, which allows writes to userspace shadow stacks from userspace. Do not allow it to be enabled independently of shadow stack, as HW does not support using WRSS when shadow stack is disabled. From a fault handler perspective, WRSS will behave very similar to WRUSS, which is treated like a user access from a #PF err code perspective. Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) - Drop set_clr_bits_msrl() (Boris) - Fix comments wrss->WRSS (Boris) v6: - Make set_clr_bits_msrl() avoid side effects in 'msr' v5: - Switch to EOPNOTSUPP - Move set_clr_bits_msrl() to patch where it is first used - Commit log formatting v3: - Make wrss_control() static - Fix verbiage in commit log (Kees) --- arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/shstk.c | 43 ++++++++++++++++++++++++++++++- 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 7dfd9dc00509..e31495668056 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -28,5 +28,6 @@ /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) +#define ARCH_SHSTK_WRSS (1ULL << 1) #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 6d2531ce661c..01b45666f1b6 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -360,6 +360,47 @@ void shstk_free(struct task_struct *tsk) unmap_shadow_stack(shstk->base, shstk->size); } +static int wrss_control(bool enable) +{ + u64 msrval; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -EOPNOTSUPP; + + /* + * Only enable WRSS if shadow stack is enabled. If shadow stack is not + * enabled, WRSS will already be disabled, so don't bother clearing it + * when disabling. + */ + if (!features_enabled(ARCH_SHSTK_SHSTK)) + return -EPERM; + + /* Already enabled/disabled? */ + if (features_enabled(ARCH_SHSTK_WRSS) == enable) + return 0; + + fpregs_lock_and_load(); + rdmsrl(MSR_IA32_U_CET, msrval); + + if (enable) { + features_set(ARCH_SHSTK_WRSS); + msrval |= CET_WRSS_EN; + } else { + features_clr(ARCH_SHSTK_WRSS); + if (!(msrval & CET_WRSS_EN)) + goto unlock; + + msrval &= ~CET_WRSS_EN; + } + + wrmsrl(MSR_IA32_U_CET, msrval); + +unlock: + fpregs_unlock(); + + return 0; +} + static int shstk_disable(void) { if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) @@ -376,7 +417,7 @@ static int shstk_disable(void) fpregs_unlock(); shstk_free(current); - features_clr(ARCH_SHSTK_SHSTK); + features_clr(ARCH_SHSTK_SHSTK | ARCH_SHSTK_WRSS); return 0; } From patchwork Sun Mar 19 00:15:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180163 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04EF7C77B6D for ; Sun, 19 Mar 2023 00:16:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B36D280022; Sat, 18 Mar 2023 20:16:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70147280001; Sat, 18 Mar 2023 20:16:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 468B1280022; Sat, 18 Mar 2023 20:16:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2F8CE280001 for ; Sat, 18 Mar 2023 20:16:56 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 071D1A02F7 for ; Sun, 19 Mar 2023 00:16:56 +0000 (UTC) X-FDA: 80583732432.05.75C6550 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf26.hostedemail.com (Postfix) with ESMTP id 06FD9140003 for ; Sun, 19 Mar 2023 00:16:53 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=P7GzbI+X; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185014; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=NjfnKDB6b0HKeEX/PKHPZraQnlxUrPOXKdybnz7FfaU=; b=PI/aQd7SpmnwS4s35Lp0fe8fvfy4QbxcAW5xdyxe83+WKc2M4j3xQo/zv0Kq73PZitK+6x xAym0gW/xGT6HMMWKukdNV/ATsMYud16syBiYdbkc87ugE3KziYttTvGrRNJ7FVe28IYE7 t5DLez3Y4yKoBs/j7wPTX8HajNBNDg0= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=P7GzbI+X; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185014; a=rsa-sha256; cv=none; b=2QmoyY9fHFm/b9JKAmWOB2dwTZd3RcyIBARjmKXV2c4JVxLipUZogtAt/sEARJxpuHrT85 ygfpP/xZyMjiROQ/Rhw50wJ0TV9/p8+8EyOq06Zh4l1nMb1jNE6AmXzhDjemCVzXDhxk3Z YfcXwzD+E2EnlGMPaaYGLPo74u2yKKk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185014; x=1710721014; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=SZKEydO4/dqCRXo76Ax/xNMIP87QKuRYX84LlmCxycM=; b=P7GzbI+XgPTG+LypVqNk7fSnc3uGjHbrjYn4ZcfTDNT41DPYUiz9X99G z/RiQ8YRiWhVM5fOkBkpu6NxF+PUuj2yyb8iUSnyLB6znDjgeIPTHEC+W Kg2B9LxZYbu43kz8JknLOpOHj1W6WngIuJo0DCr55kmjVUi7R4Y8Xxmjs goywl4vWoHvdeDMuRAGx19LIpkOJBM9ikYTIuaFhwfQdjF7jDhqa4oDy0 NJ/sK6J8Fb61u+KSiF9+/GC8SRha6M08yPQNjEc4ne2AAudNLrHKxGlzZ jz04fO3GkDM01l5527wGHZW6XgwVO1Ms2+tGuXHMhRpO7S0lCvqtfyxi3 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491524" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491524" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672980" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672980" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:51 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 35/40] x86: Expose thread features in /proc/$PID/status Date: Sat, 18 Mar 2023 17:15:30 -0700 Message-Id: <20230319001535.23210-36-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 06FD9140003 X-Stat-Signature: mwzf7dksk6bxuc9h6r9bf9rhrtw6pt45 X-Rspam-User: X-HE-Tag: 1679185013-182694 X-HE-Meta: U2FsdGVkX18t1IWUOjWEr9ChM/SmFu1eBfWoqzRaKLvmWmpbYZrpsq1pdulDXPnACRlAV7Xq/68nPeYQWxew4SgHDSvBSIyF8lEW65Tt0sdTeWuXRJXkzByG4FUpTQlXIFijmOz/LhVNZheRBEzuBubiXFvu4aTh0uDRjjhFiGiSAMMyNFVxwZnDg77qfCTq0sKTsUwLvxtj093WqqLMCqufS8dfeF3bZzwQDXQ8O5p+OObhBMPCpsd2pXPhPEwB4/e33GAfwP+MWpf+Ri4xAH9zmlqvhtO8qF0+G6a1uFTc1bjXq9+c6u8VmGJ/yVM7fBHSeAPDc7pr+83+Or88l+MeXtcQrns9q/uRbC10UOgABmkLjfGxCekwrzt+DOGaqRjur81cLpXLQSliIY0v9FYn/Z5x/9TPIarBTp1VqOvOLLCcLQL9HYXcbx2olwRibp5orX/qnlNwR5aDqdEgznsoGxsLbpC67V5HqsHPy2jZQbUXQjlVjFA4NVveUbWS9EaC/DKquVotZH7mJaLn6VCal260LQeeUrVw2QMh4ozgRF9lECvi2pNFBnMSiLBO0zETsuTkkYZK+AmoVkF7n2GtCRnA7NRh38s0avHP0dlGypVvWjjiUY+dmf3XHP/DfEsRlcrstFcsfdF4Rpjyh4jA0dqFO0ndJgPUojU6Hl+ulqOu8S9JohT3NMw4OOGUxKHMKZ8gEcSOfd8HSWEjOd2f8h9ocHqE/CS2K9AQU811iedfibB+pdpbsLnlNBHmpYgOoMSTBp7u689GVOhzAzPvUoIB26xLRX2ugaghJCAh58wSSF5bUsO42vFU2OcbfVgbsIfc6NUBw1VTgCGYt8eeNFef58aw81e1AezSjJpk/pO9cbafIUr6hJyXwHbG1AiQT9ss3XPvg6K+t0B4xRox2R1IITz2lSo8jFm9nPKyBf+t0eadeY8vuCTS9hbvOZ8sUAX3JrKjeCeDpfe VxKDdjQ+ wW5Ro4zwPBkpEcw1bNnbmQK71p6miWV6W4Lj3DH1fnfi20dcvx7ltw79pb0I7te3qptmSEHSniyIH3Q2QyMLR6PmjfvyknMyacI60svXYnd03/PSKPwFClBQIl8ON/W58kJVLWNH+AP5Yedcdl0YvRkitjRVO1HugVNJK4mDyBLNIRxM3sWSN7foK1pyD1yxK+ExEbSHEcm7xpAWPM2rcWWLqQpcFu7CTBp7+/B54DkJN6qEj/9M73eEv6cPbkbFLNGfnCn4delSUfYVAhsGnita+o/Zw7ZuFodH8wWDCZDxlvIJr5/IHufLKWZTn8vFEbt/0WT0ikiB65eoPPxdKFGVE56/P56phZ5ni0cYdSN2LLKOP9929ei6kDkngmOyxzYVELOznWO22twc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Applications and loaders can have logic to decide whether to enable shadow stack. They usually don't report whether shadow stack has been enabled or not, so there is no way to verify whether an application actually is protected by shadow stack. Add two lines in /proc/$PID/status to report enabled and locked features. Since, this involves referring to arch specific defines in asm/prctl.h, implement an arch breakout to emit the feature lines. [Switched to CET, added to commit log] Co-developed-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v4: - Remove "CET" references v3: - Move to /proc/pid/status (Kees) v2: - New patch --- arch/x86/kernel/cpu/proc.c | 23 +++++++++++++++++++++++ fs/proc/array.c | 6 ++++++ include/linux/proc_fs.h | 2 ++ 3 files changed, 31 insertions(+) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 099b6f0d96bd..31c0e68f6227 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -4,6 +4,8 @@ #include #include #include +#include +#include #include "cpu.h" @@ -175,3 +177,24 @@ const struct seq_operations cpuinfo_op = { .stop = c_stop, .show = show_cpuinfo, }; + +#ifdef CONFIG_X86_USER_SHADOW_STACK +static void dump_x86_features(struct seq_file *m, unsigned long features) +{ + if (features & ARCH_SHSTK_SHSTK) + seq_puts(m, "shstk "); + if (features & ARCH_SHSTK_WRSS) + seq_puts(m, "wrss "); +} + +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task) +{ + seq_puts(m, "x86_Thread_features:\t"); + dump_x86_features(m, task->thread.features); + seq_putc(m, '\n'); + + seq_puts(m, "x86_Thread_features_locked:\t"); + dump_x86_features(m, task->thread.features_locked); + seq_putc(m, '\n'); +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ diff --git a/fs/proc/array.c b/fs/proc/array.c index 9b0315d34c58..3e1a33dcd0d0 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -423,6 +423,11 @@ static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm) seq_printf(m, "THP_enabled:\t%d\n", thp_enabled); } +__weak void arch_proc_pid_thread_features(struct seq_file *m, + struct task_struct *task) +{ +} + int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { @@ -446,6 +451,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns, task_cpus_allowed(m, task); cpuset_task_status_allowed(m, task); task_context_switch_counts(m, task); + arch_proc_pid_thread_features(m, task); return 0; } diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 0260f5ea98fe..80ff8e533cbd 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -158,6 +158,8 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task); #endif /* CONFIG_PROC_PID_ARCH_STATUS */ +void arch_proc_pid_thread_features(struct seq_file *m, struct task_struct *task); + #else /* CONFIG_PROC_FS */ static inline void proc_root_init(void) From patchwork Sun Mar 19 00:15:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180164 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62E4BC761A6 for ; Sun, 19 Mar 2023 00:16:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77BEC280023; Sat, 18 Mar 2023 20:16:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 61749280001; Sat, 18 Mar 2023 20:16:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41B71280023; Sat, 18 Mar 2023 20:16:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 29AF6280001 for ; Sat, 18 Mar 2023 20:16:58 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 01F4880CAF for ; Sun, 19 Mar 2023 00:16:58 +0000 (UTC) X-FDA: 80583732516.27.3660E1B Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf26.hostedemail.com (Postfix) with ESMTP id 21C0C140003 for ; Sun, 19 Mar 2023 00:16:55 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=i6l+xyb8; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185016; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=gQxmIVCDyiXNijRz9GDbHj/9hmkObe8BeLOXxhdnjvY=; b=rTkW2cbXYg/7wo/hwMaVQad7heNUDD8hU3GAveXr2xASU7JDbeyBwGk/zsLvxm2GIXzG8O hgRexB8sBMrwhe5BNLcXCZ6Cn021R/Fou1TmHApLwvYm5EiYa1sD1zd72CpyWKlzCj3wfS op/iadLt6/Q6bXGQRCgsa7AyhmxYa98= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=i6l+xyb8; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185016; a=rsa-sha256; cv=none; b=D6NQjKzbgc76MxeywXtv7eX1d3z+MXfEUI48sGCExy7nxpb1i0/MTXBv9tRzPW/0OnDYaQ KxEFBhjhq04x3GZBzi//5wfnxmwWfjLLavydPrZsWW7MIozkdSshcxH3GXHg0dGU3BElsR 5w3pSiDeLbKjoMsAfDPoT13RQKZiczs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185016; x=1710721016; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=XWhd33s80bJV+B2OfEZjFLW5DvuhEuOlL0wc6R5POX8=; b=i6l+xyb8qVDiWcZq04Z/9eBXOqgBwnIqYkzpC3izi7cMBI2XG5j4g2vD g13O4gZAdjJT27lun1L/dQM37tTdMg+NZL4+N5UTjbX/X4CdIEbBZ83vs b/ziaq+gdswhQs2qZ7I/BVxxnRfTpAoOAcRkllsMNJf5GCIf1+6uIAKtk jNrCc8/VRL8v9ThmIzQ1vV1WX9c1ri5oU2r4AsTtlBeVWCvAWq4FQptdP QiDNgSUhKN1FKDs9aZDUtVav1Dzt5nAAl0M5hkg+0SBLUqRtblSXe/wFt QCF9t4jMyYl11BPbpHWBF8egI9CawUGe8yxdlsJy8whpRGVMStJ7wScxy w==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491547" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491547" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672989" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672989" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:53 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 36/40] x86/shstk: Wire in shadow stack interface Date: Sat, 18 Mar 2023 17:15:31 -0700 Message-Id: <20230319001535.23210-37-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 21C0C140003 X-Stat-Signature: qo49hcc5jqjjuzyr6ddhwtexwymx1ycz X-Rspam-User: X-HE-Tag: 1679185015-168953 X-HE-Meta: U2FsdGVkX1/0SQ54oO6zNrTGp6FqQqIUd6z7r14e4sQxNm+sCZ2gBDPV8hVsJWvZ/tapSlXEaOMYo2lSyUrDsQiNXd/Vw7bYkRw2uZOw3LIjXQgJadDYx7kAIsrqPz4MdhWvCBu/OrwaLwYMTp4Z8M13kAYFA3RjZ2AVVaF47R9PvB8Fj8aBzjP7uSd7s9V0qmH+NXbzgSoxN53V07Oj7wAyyHiJJnbx7fQPuViBkhITZeeorXbcrwHekRGM0VD+l46MRmTNKCzZSE9C5PNZ5qJqJc2UtTU/lXqTsrxFzlyueqw6D6vNkxh/vYcqHJrT3QWdYoB8rE7IG60VPsbgokBM0peCmLVHS3zZKlfGFOZSql4pjKvaytLrG3G94UsrFkCQO8GJWl7a5nkIwuFLavB3/wCl7SmmEVu4GNUDp1MLuRxV/9fN04a/eGTehrvOEknubZElEoTcfCSM8oGjaNBzCQ3esv/teumLwl6GhgfslFIu0in5YjC0bp/MX1fAXzbMBt1FDpjUTAcs4/toixLGvqiaPs36ZNofp+VhBzsIBR45GpE0HAHAg9dRUvRxnoQiBBKCK2zV4VhOEXXXs3mRQ/jLTAYlQZWp0ad68hOSrZ/OsA300IXdNicw+wt7YnKXgVxNeMONY37QiWaUumJO/I8qOhnJGP2Shl+jSWWpqS0BQLgRE0UP4E+NkIY4oX3pdhZ7QJiEVdPixglbQ6Z1CHVhmRKZlc1GffUkscXKEhT3oeRmsqCgUvMM3a5oJDq5M2bOQE+v3yd090RxMwpGohC+cpZvXxfOpYMbLlR7wrJrl4CLo2D6ptkr/PdGxE/i5xR+XiXUSV4pfxF7H4l2h3/81BuTuVqo9zSwYTgBe5zHXAUuMwbz7DTlQ8d+OoM9csX5b4Xmd2gFhvJpnIiv5kijxs1Ksm9T2k/ZOqpSfxZI5w+E7tigeQS2T0ujF0g5Ckc3LRiy46L0qJT MjinZQ4f xc0DARnPnr89IOolzKR+D7zMLMyzdeow73oLUulS4IK2OnpOno40Sl4Zk1ctB5dcv7ipvVgdc32t57LTvWoZxcJ39LVd1vaUKszmwhOjf/hdQ5yEGAAX62NRIeHLdr/m7hVXMGjeyEK6vQABMMswf9EEq2xJl7Pz0U6FEVyiEifTpS5MV2pECAemReBPnx/R1S5dbe57nlZimfgTVlK4tO6qI3++aJtXJvhO5FCyJHu6DUKU09i/3mCgLzUBRGsp4ZxM3SW+xOrzbXr++EODDxKyHhXMy3vF/SQw4cmOtM8Lofsd9PN8y2IUkWmERTit0ol7ctIM3Qo5gR62EbWlGNNGIsvyNvdYBo0yV X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The kernel now has the main shadow stack functionality to support applications. Wire in the WRSS and shadow stack enable/disable functions into the existing shadow stack API skeleton. Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v4: - Remove "CET" references v2: - Split from other patches --- arch/x86/kernel/shstk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 01b45666f1b6..ee89c4206ac9 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -472,9 +472,17 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long features) return -EINVAL; if (option == ARCH_SHSTK_DISABLE) { + if (features & ARCH_SHSTK_WRSS) + return wrss_control(false); + if (features & ARCH_SHSTK_SHSTK) + return shstk_disable(); return -EINVAL; } /* Handle ARCH_SHSTK_ENABLE */ + if (features & ARCH_SHSTK_SHSTK) + return shstk_setup(); + if (features & ARCH_SHSTK_WRSS) + return wrss_control(true); return -EINVAL; } From patchwork Sun Mar 19 00:15:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEDDBC7618A for ; Sun, 19 Mar 2023 00:17:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33554280024; Sat, 18 Mar 2023 20:17:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 247D6280001; Sat, 18 Mar 2023 20:17:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC889280024; Sat, 18 Mar 2023 20:16:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CD1D3280001 for ; Sat, 18 Mar 2023 20:16:59 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id AC5CBAB0AC for ; Sun, 19 Mar 2023 00:16:59 +0000 (UTC) X-FDA: 80583732558.08.DA13E69 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf01.hostedemail.com (Postfix) with ESMTP id 80FF440016 for ; Sun, 19 Mar 2023 00:16:57 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OF5q3dGI; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185017; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=JpMzkk+jy0rJ6f1elZMSo7lNRiY2NQ3lwqoaaYq6es4=; b=D63/qLh68kaPmCCLDtHX861orZEPf+KBmV6dBMexE4Tz8KBwjJF31B98fcSO+JL0IyknQs R7kOuwLsDK2pe3euOd+hWVJY2AjhNBJWNk0QMp8+vUcM1vZfA5PGdRFBWcejAmJwSSQpnq 8XaoE2uBAbrCsFOn9+JupdAXuoNFUXs= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OF5q3dGI; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185017; a=rsa-sha256; cv=none; b=F6yPb8tsN3Pe79kpzQ9/nI7vTMPqC9i5RQulGdDQttnLgS8VNBsHzjUpL1TKnoqzDrpoMG NuqCTgXXj4BdJxjPiO2KoVcF67fPxBp4L3pLZAL0xc/pld68MN00KfySvXMjqd9W9/FXja N9XlyObYzK5xg1CelEko5geqS1KFUhQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185017; x=1710721017; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=vX8wXG9P6QSNMOt0GN0vBlDeSdgKUz2zkT7fbKn25E0=; b=OF5q3dGIbOC9OoZI+V+0ITY3k+VP+h0pcMCNrvAzBPTzHFpi7B60/Xsu nfEGCbRPEQ9e1HOfcBXgFJHIG5xEhElcJcUTGweiY8pTLaxSRxcqBa/Oe 6GvMSXBN+K6ZU+yWIRyDQ0jEOOATszn7pE0oD2lK3zrjefitBZLtZvjrn sSA8flaM2fPz8ynbH1ghAkopOPbDcv9miJX4GLKERg2W+v8FrDDvNxEMf US5rLy6H+UgkzzJhQmOjU5CTko2jZwkS4cYyKYFgw1Vo1mZ13x8oKULD4 zi3q8sntPa0Q/q7jg1MoFjYO3xQ7piKo7x7TEQw3CT76iDR9kF2Fbegfn A==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491569" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491569" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749672998" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749672998" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:54 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 37/40] selftests/x86: Add shadow stack test Date: Sat, 18 Mar 2023 17:15:32 -0700 Message-Id: <20230319001535.23210-38-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 80FF440016 X-Stat-Signature: chnatye9dsriwp5s1eajjinjw4o89ujw X-HE-Tag: 1679185017-122455 X-HE-Meta: U2FsdGVkX1+ioVzQ+YRyTc+28bjA7VzyE/1Q1nGreS86fzrlrE9kwOF9bwlEQyBexb/XJlJL5ELTqkKGDOVTBv6HWv3muXm4SC+2dP0DFMd8XcyywvAHGvQNyb7fwN1mBfITkrMSOzZRdqRpFnBbKw092IvEbP3C3fzMmt1Hw0NfV/Y1HQxO7NMop0OHFNKC/Vqr8RTgqkd4IcOi2DYjbQOjDd/DnunA2RCTFnDsZQyiP9sjPg6QZX3uti7XTHYC17o0BG2HS19Jn0KxYfFnfXD9xVvQCG/2agUjYt7obPYtx2KA3DA5clqzYcED+FKfP/ZrddcSF/TBPQINbcBdWAfSWI8mqpV86apbTyiDv43ciDUnAu2nABYqGB1P70zFjbXFg9NAEi03pLnoc4iAEeuFGJSz+91hGvQKUqONmFEtwobIDRfnyIMdn3eu88cGBC/6I6wzAv9e9DxptNU8ohDdYl1DW39KbNjzk4SAc+4IqF1wzUZpElhEHL7MY2e6zx13RRiy6IcxrRX5hppepBjKsw+yJv1cGt2QwTDP+OnJ+s84cpYrhVxE+O1zpu5/Nn58k+8d7bkEy5w1s8kMTusLv01CfygxWph+Jy1ufXTCLykLvv1yIBQ5Y1tRe/krzFpyoirv7Wr9ADLZat5I47FClmbN4fkYbCEF7zMGHmGWKgNxSRGjpyVqp4a3oU6c5+lxsO753qSfqjziiNW/P+53bpjE/5304bVguhXCZmBUrup5wytGfJTd/GBsnnsHePVy8nuNW0LZ/nMcqfi92kIfTL/aPq0V6tPzNgShweOHCtjxXdMcgs1ieyldu9DPWyYroG/UxFGkP8+1tSgudkzfxYZvKQlCxIroFe3J6oZ8INWtWdffxaujXpWGF1tU5SbJ+SQeBvXYD+CzU9V1TAY23DHyWP0/7urS/vC1Ia2rf14U2qBL3L8uc36W6au15RrWXJhwEIu0yEAsEcG 2ls4bHVm u2FGEdYuX5HGqHZqzHEvT9cEXp7fMJST8p49vcUE0xewHxQeas3F/uuMqfend95sTY8/DGaCkyVFW3yevuMjFWYCn/cVRVwPbsqsbS+ajQBEmc5IW/d48diMxPQLMUYQ9yW3i5aJ8+OSAzH+GnLdIP+0E95RcxQCyQK+qdpIZvwbKFUk8swIgQ2nrKNPsh8t9IcmC0a07Ix/WODJNz07HbT9txH2ifJzmo90wZR1TzGloevZzVqH9uzg0EH8y/BNXznrUnIWjjiaPRE5kcmNJI9ZI7EW5I6DWo9qwTh54BuWqILQBHRw5ZtZlKT7JFuHmV5mfTCaX5pUJBqf1lDJm2+zEFIfEgYv+Qv+0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a simple selftest for exercising some shadow stack behavior: - map_shadow_stack syscall and pivot - Faulting in shadow stack memory - Handling shadow stack violations - GUP of shadow stack memory - mprotect() of shadow stack memory - Userfaultfd on shadow stack memory - 32 bit segmentation Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v7: - Remove KHDR_INCLUDES and just add a copy of the defines (Boris) v6: - Tweak mprotect test - Code style tweaks v5: - Update 32 bit signal test with new ABI and better asm v4: - Add test for 32 bit signal ABI blocking --- tools/testing/selftests/x86/Makefile | 2 +- .../testing/selftests/x86/test_shadow_stack.c | 695 ++++++++++++++++++ 2 files changed, 696 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/test_shadow_stack.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index ca9374b56ead..cfc8a26ad151 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -18,7 +18,7 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer TARGETS_C_64BIT_ONLY := fsgsbase sysret_rip syscall_numbering \ - corrupt_xstate_header amx + corrupt_xstate_header amx test_shadow_stack # Some selftests require 32bit support enabled also on 64bit systems TARGETS_C_32BIT_NEEDED := ldt_gdt ptrace_syscall diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c new file mode 100644 index 000000000000..94eb223456f6 --- /dev/null +++ b/tools/testing/selftests/x86/test_shadow_stack.c @@ -0,0 +1,695 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * This program test's basic kernel shadow stack support. It enables shadow + * stack manual via the arch_prctl(), instead of relying on glibc. It's + * Makefile doesn't compile with shadow stack support, so it doesn't rely on + * any particular glibc. As a result it can't do any operations that require + * special glibc shadow stack support (longjmp(), swapcontext(), etc). Just + * stick to the basics and hope the compiler doesn't do anything strange. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Define the ABI defines if needed, so people can run the tests + * without building the headers. + */ +#ifndef __NR_map_shadow_stack +#define __NR_map_shadow_stack 451 + +#define SHADOW_STACK_SET_TOKEN (1ULL << 0) + +#define ARCH_SHSTK_ENABLE 0x5001 +#define ARCH_SHSTK_DISABLE 0x5002 +#define ARCH_SHSTK_LOCK 0x5003 +#define ARCH_SHSTK_UNLOCK 0x5004 +#define ARCH_SHSTK_STATUS 0x5005 + +#define ARCH_SHSTK_SHSTK (1ULL << 0) +#define ARCH_SHSTK_WRSS (1ULL << 1) +#endif + +#define SS_SIZE 0x200000 + +#if (__GNUC__ < 8) || (__GNUC__ == 8 && __GNUC_MINOR__ < 5) +int main(int argc, char *argv[]) +{ + printf("[SKIP]\tCompiler does not support CET.\n"); + return 0; +} +#else +void write_shstk(unsigned long *addr, unsigned long val) +{ + asm volatile("wrssq %[val], (%[addr])\n" + : "=m" (addr) + : [addr] "r" (addr), [val] "r" (val)); +} + +static inline unsigned long __attribute__((always_inline)) get_ssp(void) +{ + unsigned long ret = 0; + + asm volatile("xor %0, %0; rdsspq %0" : "=r" (ret)); + return ret; +} + +/* + * For use in inline enablement of shadow stack. + * + * The program can't return from the point where shadow stack gets enabled + * because there will be no address on the shadow stack. So it can't use + * syscall() for enablement, since it is a function. + * + * Based on code from nolibc.h. Keep a copy here because this can't pull in all + * of nolibc.h. + */ +#define ARCH_PRCTL(arg1, arg2) \ +({ \ + long _ret; \ + register long _num asm("eax") = __NR_arch_prctl; \ + register long _arg1 asm("rdi") = (long)(arg1); \ + register long _arg2 asm("rsi") = (long)(arg2); \ + \ + asm volatile ( \ + "syscall\n" \ + : "=a"(_ret) \ + : "r"(_arg1), "r"(_arg2), \ + "0"(_num) \ + : "rcx", "r11", "memory", "cc" \ + ); \ + _ret; \ +}) + +void *create_shstk(void *addr) +{ + return (void *)syscall(__NR_map_shadow_stack, addr, SS_SIZE, SHADOW_STACK_SET_TOKEN); +} + +void *create_normal_mem(void *addr) +{ + return mmap(addr, SS_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); +} + +void free_shstk(void *shstk) +{ + munmap(shstk, SS_SIZE); +} + +int reset_shstk(void *shstk) +{ + return madvise(shstk, SS_SIZE, MADV_DONTNEED); +} + +void try_shstk(unsigned long new_ssp) +{ + unsigned long ssp; + + printf("[INFO]\tnew_ssp = %lx, *new_ssp = %lx\n", + new_ssp, *((unsigned long *)new_ssp)); + + ssp = get_ssp(); + printf("[INFO]\tchanging ssp from %lx to %lx\n", ssp, new_ssp); + + asm volatile("rstorssp (%0)\n":: "r" (new_ssp)); + asm volatile("saveprevssp"); + printf("[INFO]\tssp is now %lx\n", get_ssp()); + + /* Switch back to original shadow stack */ + ssp -= 8; + asm volatile("rstorssp (%0)\n":: "r" (ssp)); + asm volatile("saveprevssp"); +} + +int test_shstk_pivot(void) +{ + void *shstk = create_shstk(0); + + if (shstk == MAP_FAILED) { + printf("[FAIL]\tError creating shadow stack: %d\n", errno); + return 1; + } + try_shstk((unsigned long)shstk + SS_SIZE - 8); + free_shstk(shstk); + + printf("[OK]\tShadow stack pivot\n"); + return 0; +} + +int test_shstk_faults(void) +{ + unsigned long *shstk = create_shstk(0); + + /* Read shadow stack, test if it's zero to not get read optimized out */ + if (*shstk != 0) + goto err; + + /* Wrss memory that was already read. */ + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + /* Page out memory, so we can wrss it again. */ + if (reset_shstk((void *)shstk)) + goto err; + + write_shstk(shstk, 1); + if (*shstk != 1) + goto err; + + printf("[OK]\tShadow stack faults\n"); + return 0; + +err: + return 1; +} + +unsigned long saved_ssp; +unsigned long saved_ssp_val; +volatile bool segv_triggered; + +void __attribute__((noinline)) violate_ss(void) +{ + saved_ssp = get_ssp(); + saved_ssp_val = *(unsigned long *)saved_ssp; + + /* Corrupt shadow stack */ + printf("[INFO]\tCorrupting shadow stack\n"); + write_shstk((void *)saved_ssp, 0); +} + +void segv_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tGenerated shadow stack violation successfully\n"); + + segv_triggered = true; + + /* Fix shadow stack */ + write_shstk((void *)saved_ssp, saved_ssp_val); +} + +int test_shstk_violation(void) +{ + struct sigaction sa; + + sa.sa_sigaction = segv_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + /* Make sure segv_triggered is set before violate_ss() */ + asm volatile("" : : : "memory"); + + violate_ss(); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow stack violation test\n"); + + return !segv_triggered; +} + +/* Gup test state */ +#define MAGIC_VAL 0x12345678 +bool is_shstk_access; +void *shstk_ptr; +int fd; + +void reset_test_shstk(void *addr) +{ + if (shstk_ptr) + free_shstk(shstk_ptr); + shstk_ptr = create_shstk(addr); +} + +void test_access_fix_handler(int signum, siginfo_t *si, void *uc) +{ + printf("[INFO]\tViolation from %s\n", is_shstk_access ? "shstk access" : "normal write"); + + segv_triggered = true; + + /* Fix shadow stack */ + if (is_shstk_access) { + reset_test_shstk(shstk_ptr); + return; + } + + free_shstk(shstk_ptr); + create_normal_mem(shstk_ptr); +} + +bool test_shstk_access(void *ptr) +{ + is_shstk_access = true; + segv_triggered = false; + write_shstk(ptr, MAGIC_VAL); + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool test_write_access(void *ptr) +{ + is_shstk_access = false; + segv_triggered = false; + *(unsigned long *)ptr = MAGIC_VAL; + + asm volatile("" : : : "memory"); + + return segv_triggered; +} + +bool gup_write(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (write(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +bool gup_read(void *ptr) +{ + unsigned long val; + + lseek(fd, (unsigned long)ptr, SEEK_SET); + if (read(fd, &val, sizeof(val)) < 0) + return 1; + + return 0; +} + +int test_gup(void) +{ + struct sigaction sa; + int status; + pid_t pid; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + return 1; + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (test_shstk_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> shstk access success\n"); + + reset_test_shstk(0); + if (gup_read(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup read -> write access success\n"); + + reset_test_shstk(0); + if (gup_write(shstk_ptr)) + return 1; + if (!test_write_access(shstk_ptr)) + return 1; + printf("[INFO]\tGup write -> write access success\n"); + + close(fd); + + /* COW/gup test */ + reset_test_shstk(0); + pid = fork(); + if (!pid) { + fd = open("/proc/self/mem", O_RDWR); + if (fd == -1) + exit(1); + + if (gup_write(shstk_ptr)) { + close(fd); + exit(1); + } + close(fd); + exit(0); + } + waitpid(pid, &status, 0); + if (WEXITSTATUS(status)) { + printf("[FAIL]\tWrite in child failed\n"); + return 1; + } + if (*(unsigned long *)shstk_ptr == MAGIC_VAL) { + printf("[FAIL]\tWrite in child wrote through to shared memory\n"); + return 1; + } + + printf("[INFO]\tCow gup write -> write access success\n"); + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tShadow gup test\n"); + + return 0; +} + +int test_mprotect(void) +{ + struct sigaction sa; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + /* mprotect a shadow stack as read only */ + reset_test_shstk(0); + if (mprotect(shstk_ptr, SS_SIZE, PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_READ) failed\n"); + return 1; + } + + /* try to wrss it and fail */ + if (!test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to read-only memory succeeded\n"); + return 1; + } + + /* + * The shadow stack was reset above to resolve the fault, make the new one + * read-only. + */ + if (mprotect(shstk_ptr, SS_SIZE, PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_READ) failed\n"); + return 1; + } + + /* then back to writable */ + if (mprotect(shstk_ptr, SS_SIZE, PROT_WRITE | PROT_READ) < 0) { + printf("[FAIL]\tmprotect(PROT_WRITE) failed\n"); + return 1; + } + + /* then wrss to it and succeed */ + if (test_shstk_access(shstk_ptr)) { + printf("[FAIL]\tShadow stack access to mprotect() writable memory failed\n"); + return 1; + } + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + printf("[OK]\tmprotect() test\n"); + + return 0; +} + +char zero[4096]; + +static void *uffd_thread(void *arg) +{ + struct uffdio_copy req; + int uffd = *(int *)arg; + struct uffd_msg msg; + + if (read(uffd, &msg, sizeof(msg)) <= 0) + return (void *)1; + + req.dst = msg.arg.pagefault.address; + req.src = (__u64)zero; + req.len = 4096; + req.mode = 0; + + if (ioctl(uffd, UFFDIO_COPY, &req)) + return (void *)1; + + return (void *)0; +} + +int test_userfaultfd(void) +{ + struct uffdio_register uffdio_register; + struct uffdio_api uffdio_api; + struct sigaction sa; + pthread_t thread; + void *res; + int uffd; + + sa.sa_sigaction = test_access_fix_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + printf("[SKIP]\tUserfaultfd unavailable.\n"); + return 0; + } + + reset_test_shstk(0); + + uffdio_api.api = UFFD_API; + uffdio_api.features = 0; + if (ioctl(uffd, UFFDIO_API, &uffdio_api)) + goto err; + + uffdio_register.range.start = (__u64)shstk_ptr; + uffdio_register.range.len = 4096; + uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) + goto err; + + if (pthread_create(&thread, NULL, &uffd_thread, &uffd)) + goto err; + + reset_shstk(shstk_ptr); + test_shstk_access(shstk_ptr); + + if (pthread_join(thread, &res)) + goto err; + + if (test_shstk_access(shstk_ptr)) + goto err; + + free_shstk(shstk_ptr); + + signal(SIGSEGV, SIG_DFL); + + if (!res) + printf("[OK]\tUserfaultfd test\n"); + return !!res; +err: + free_shstk(shstk_ptr); + close(uffd); + signal(SIGSEGV, SIG_DFL); + return 1; +} + +/* + * Too complicated to pull it out of the 32 bit header, but also get the + * 64 bit one needed above. Just define a copy here. + */ +#define __NR_compat_sigaction 67 + +/* + * Call 32 bit signal handler to get 32 bit signals ABI. Make sure + * to push the registers that will get clobbered. + */ +int sigaction32(int signum, const struct sigaction *restrict act, + struct sigaction *restrict oldact) +{ + register long syscall_reg asm("eax") = __NR_compat_sigaction; + register long signum_reg asm("ebx") = signum; + register long act_reg asm("ecx") = (long)act; + register long oldact_reg asm("edx") = (long)oldact; + int ret = 0; + + asm volatile ("int $0x80;" + : "=a"(ret), "=m"(oldact) + : "r"(syscall_reg), "r"(signum_reg), "r"(act_reg), + "r"(oldact_reg) + : "r8", "r9", "r10", "r11" + ); + + return ret; +} + +sigjmp_buf jmp_buffer; + +void segv_gp_handler(int signum, siginfo_t *si, void *uc) +{ + segv_triggered = true; + + /* + * To work with old glibc, this can't rely on siglongjmp working with + * shadow stack enabled, so disable shadow stack before siglongjmp(). + */ + ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK); + siglongjmp(jmp_buffer, -1); +} + +/* + * Transition to 32 bit mode and check that a #GP triggers a segfault. + */ +int test_32bit(void) +{ + struct sigaction sa; + struct sigaction *sa32; + + /* Create sigaction in 32 bit address range */ + sa32 = mmap(0, 4096, PROT_READ | PROT_WRITE, + MAP_32BIT | MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + sa32->sa_flags = SA_SIGINFO; + + sa.sa_sigaction = segv_gp_handler; + if (sigaction(SIGSEGV, &sa, NULL)) + return 1; + sa.sa_flags = SA_SIGINFO; + + segv_triggered = false; + + /* Make sure segv_triggered is set before triggering the #GP */ + asm volatile("" : : : "memory"); + + /* + * Set handler to somewhere in 32 bit address space + */ + sa32->sa_handler = (void *)sa32; + if (sigaction32(SIGUSR1, sa32, NULL)) + return 1; + + if (!sigsetjmp(jmp_buffer, 1)) + raise(SIGUSR1); + + if (segv_triggered) + printf("[OK]\t32 bit test\n"); + + return !segv_triggered; +} + +int main(int argc, char *argv[]) +{ + int ret = 0; + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) { + printf("[SKIP]\tCould not re-enable Shadow stack\n"); + return 1; + } + + if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_WRSS)) { + printf("[SKIP]\tCould not enable WRSS\n"); + ret = 1; + goto out; + } + + /* Should have succeeded if here, but this is a test, so double check. */ + if (!get_ssp()) { + printf("[FAIL]\tShadow stack disabled\n"); + return 1; + } + + if (test_shstk_pivot()) { + ret = 1; + printf("[FAIL]\tShadow stack pivot\n"); + goto out; + } + + if (test_shstk_faults()) { + ret = 1; + printf("[FAIL]\tShadow stack fault test\n"); + goto out; + } + + if (test_shstk_violation()) { + ret = 1; + printf("[FAIL]\tShadow stack violation test\n"); + goto out; + } + + if (test_gup()) { + ret = 1; + printf("[FAIL]\tShadow shadow stack gup\n"); + goto out; + } + + if (test_mprotect()) { + ret = 1; + printf("[FAIL]\tShadow shadow mprotect test\n"); + goto out; + } + + if (test_userfaultfd()) { + ret = 1; + printf("[FAIL]\tUserfaultfd test\n"); + goto out; + } + + if (test_32bit()) { + ret = 1; + printf("[FAIL]\t32 bit test\n"); + } + + return ret; + +out: + /* + * Disable shadow stack before the function returns, or there will be a + * shadow stack violation. + */ + if (ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK)) { + ret = 1; + printf("[FAIL]\tDisabling shadow stack failed\n"); + } + + return ret; +} +#endif From patchwork Sun Mar 19 00:15:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180166 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7B95C7619A for ; Sun, 19 Mar 2023 00:17:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7EC5B280025; Sat, 18 Mar 2023 20:17:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 776F2280001; Sat, 18 Mar 2023 20:17:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57A6D280025; Sat, 18 Mar 2023 20:17:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3A7C8280001 for ; Sat, 18 Mar 2023 20:17:01 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0D54E1C6431 for ; Sun, 19 Mar 2023 00:17:01 +0000 (UTC) X-FDA: 80583732642.23.B7660BB Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf26.hostedemail.com (Postfix) with ESMTP id DEFE914000C for ; Sun, 19 Mar 2023 00:16:58 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=e8y01sCK; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185019; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=yHuJtH0ZX4t1G+OD4X4Fg0aKTiRaA6fz+P/1xCa+LZ0=; b=gew5gPlzZcGqFsJfnoJ8OfKW+/rLMNeRUDFZq1O+0U1cnEmDXt6XwIXIiyWW9soAxGasGp lZ4cc/HRPcM8ufBV34uoBVmLiUKQR5PTKrjM6n17fJ5xqGosGjGky5ngEXQeVbC8lhH3aD aj6IcksOeohtTdVYl5ZqlR4oDTf6Kqg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=e8y01sCK; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185019; a=rsa-sha256; cv=none; b=a/taBgCY2z1PkD7T8xYCOTXwnw9N0Rj95j/ZLHkU1sNIP8FB8NubQYiS8sdMGkp7WszWPJ K+E9fpYVWjcBxoUJPMBQMx+yH1VjlAElqux0XApW02CBppJ6PWCa6cPJfklVYGEE+jQot1 3Bd2kN1GbYmO7M4YBNd/h0SEH1oA0wU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185018; x=1710721018; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=vzjvLf4i3g21tZUx0No2irY3Y9uThaxbSSa7EUI0VaM=; b=e8y01sCK0JwBBsx2F9XwLWdy5HRY55eGIjfcVwFjNA5FDUTOaycQrEiL gYNhNFx+4zkm52mW2UwMWQlt+r5nEpkBIpk67ydF1bFmC7LG4k3C84etS SKUOBKEDjJiuHo/eMZ32Raz1bDdl7il1j4DBz7gjRvRVtPk98x8yAiYpS Zk6SvhbPaL1ysXDFqKvqOZ0WOwkjHV2cjbCEX52pxZR4n/pa/yUucZYm9 iBnlzybl2hiRq6cWkUV2KkPgWunERIwMGswpBhK0IYqjxX8/BBau4kVvw fQPMakdfbjXMj0hBVBgW8uhwjtkn01tiYpxWmA5dFW30L5hEdaQWKQ7f6 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491592" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491592" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749673008" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749673008" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:56 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v8 38/40] x86: Add PTRACE interface for shadow stack Date: Sat, 18 Mar 2023 17:15:33 -0700 Message-Id: <20230319001535.23210-39-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DEFE914000C X-Stat-Signature: ghsweffz7ptew44q4rkygiugwxa3y5mp X-Rspam-User: X-HE-Tag: 1679185018-186212 X-HE-Meta: U2FsdGVkX18pDGPhH4lT8rnaYe2yNof1uHCHzo2Ne0bdjghf7kjA4iwCHHGfd6F27GiXkZ1Wj1XVE+Nx/pe+dSoJPS0pNWbcQxoZVt16+mpHKWShuAosbCZyzjNlRheS7tnv57VKjrHOFqPMlig6R3xqbKL6y2oJozLLwhZo5vWhnuNPwBXxU/I1yGcGC7W597cAdONkS/RqfCPSGEOcUtuYLmyJOMUhZzAHxpFGfxLzN+gfl1XB33lQvHvpowEpzVIo4gxEJKW5vTNAeXv8FwAkfhRa1/DZn9gM7oZJCyk5Ozvfxbde9sCKe7iy/2zhvxKqRYabEZjdbfJseNEhvsiEjBb25xB2mqGiJ7lMCO/qQ7CSjfiNPCy1yvsmjqzpBEdmn8HkY66EcBVzIswDdONL8O2gE1O4E+YlelMTMjBCi34XvYrp8L/RnzOAm7YRaYDjNvLVYfBe3gm95t4NgiSAM9oW0cVbg5Wm47KL5ruQM8Dgt/rDCrqv25PKmKGkfJJblZh6j5bcg8ud7sFNqXsB35N99U4giQiqpZ1ZssWTZ7178XLRS+durZ/KVOCgNpxD5VHdISm1mQ1HTfANqOicQ9PmBGoemWxXwm7tzYasJSPsPNU8UFpGdbTIX0cpM1USaIVjs8L12Zoe5/ixCieyKNptCaZey3i9n3/+TpNcNF03jJIB89sTXgkDdTvdgvwlC2cBxLQbybxKg4OO7YMdYBwCovORVobV7KeihwvYDGj6L529FPG+vcX7PxbmcxeasRv8oKEaDX5XWMJ3o7q2vScJ9RQjVlqoplifBQgp4yuCCCEdYwtrU4wsGuZBQ54LM7fSnQ/nkFP1blIWCnj86nK5TWTL1co9qjQ3nuf1LydYyZlLPvcHez8l7KeoY4I6ykLpIGRdxejpWT2rO6aD+CyEdK4AfRK+6cfauNWyCAcrIGRw4YuKVr4RkLUisep24kg15KMYiPXuuaG 7XXawkr3 MQPWZgp66tevWDFYOmyH89vU9Cp7u3CUijBPZbEPlyBfOwQ/cYHTBngDdPvB39r1ffjUUD2biZLfKjlJHO8M9NZ+4jLd3RJGyxB5Yhn06Vg1AsQOst0NUK6eVJ8Sqi8ffv7yj6EVXhgZYkdUVqAuakeMfwTZxVSxWk7QMp016zcW22/BhR7rWHGW5I29VGael2LDdsljOB03wYGilU/5kEkwxHSRJ0Q7b+3hGeZB553uukN2FSSh476ktoA9QiiyLImHdDHyepbbvsEnYNxX0L0XD83nOpC3PFSH5EzsrfUFRb9d1SqYVlr0GbNATrYCiq+Vf2+yZ5S5D1h+cyFbUtIPNBcWTCq7Kb+Pc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Some applications (like GDB) would like to tweak shadow stack state via ptrace. This allows for existing functionality to continue to work for seized shadow stack applications. Provide a regset interface for manipulating the shadow stack pointer (SSP). There is already ptrace functionality for accessing xstate, but this does not include supervisor xfeatures. So there is not a completely clear place for where to put the shadow stack state. Adding it to the user xfeatures regset would complicate that code, as it currently shares logic with signals which should not have supervisor features. Don't add a general supervisor xfeature regset like the user one, because it is better to maintain flexibility for other supervisor xfeatures to define their own interface. For example, an xfeature may decide not to expose all of it's state to userspace, as is actually the case for shadow stack ptrace functionality. A lot of enum values remain to be used, so just put it in dedicated shadow stack regset. The only downside to not having a generic supervisor xfeature regset, is that apps need to be enlightened of any new supervisor xfeature exposed this way (i.e. they can't try to have generic save/restore logic). But maybe that is a good thing, because they have to think through each new xfeature instead of encountering issues when a new supervisor xfeature was added. By adding a shadow stack regset, it also has the effect of including the shadow stack state in a core dump, which could be useful for debugging. The shadow stack specific xstate includes the SSP, and the shadow stack and WRSS enablement status. Enabling shadow stack or WRSS in the kernel involves more than just flipping the bit. The kernel is made aware that it has to do extra things when cloning or handling signals. That logic is triggered off of separate feature enablement state kept in the task struct. So the flipping on HW shadow stack enforcement without notifying the kernel to change its behavior would severely limit what an application could do without crashing, and the results would depend on kernel internal implementation details. There is also no known use for controlling this state via ptrace today. So only expose the SSP, which is something that userspace already has indirect control over. Co-developed-by: Yu-cheng Yu Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Update commit log verbiage (Boris) - Stop using init_xfeature() and just return an error if the init state is encountered, since it shouldn't be. (Boris) v5: - Check shadow stack enablement status for tracee (rppt) - Fix typo in comment v4: - Make shadow stack only. Reduce to only supporting SSP register, and remove CET references (peterz) - Add comment to not use 0x203, because binutils already looks for it in coredumps. (Christina Schimpe) v3: - Drop dependence on thread.shstk.size, and use thread.features bits - Drop 32 bit support --- arch/x86/include/asm/fpu/regset.h | 7 +-- arch/x86/kernel/fpu/regset.c | 78 +++++++++++++++++++++++++++++++ arch/x86/kernel/ptrace.c | 12 +++++ include/uapi/linux/elf.h | 2 + 4 files changed, 96 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu/regset.h b/arch/x86/include/asm/fpu/regset.h index 4f928d6a367b..697b77e96025 100644 --- a/arch/x86/include/asm/fpu/regset.h +++ b/arch/x86/include/asm/fpu/regset.h @@ -7,11 +7,12 @@ #include -extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active; +extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active, + ssp_active; extern user_regset_get2_fn fpregs_get, xfpregs_get, fpregs_soft_get, - xstateregs_get; + xstateregs_get, ssp_get; extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set, - xstateregs_set; + xstateregs_set, ssp_set; /* * xstateregs_active == regset_fpregs_active. Please refer to the comment diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c index 6d056b68f4ed..f0a8eaf7c52e 100644 --- a/arch/x86/kernel/fpu/regset.c +++ b/arch/x86/kernel/fpu/regset.c @@ -8,6 +8,7 @@ #include #include #include +#include #include "context.h" #include "internal.h" @@ -174,6 +175,83 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, return ret; } +#ifdef CONFIG_X86_USER_SHADOW_STACK +int ssp_active(struct task_struct *target, const struct user_regset *regset) +{ + if (target->thread.features & ARCH_SHSTK_SHSTK) + return regset->n; + + return 0; +} + +int ssp_get(struct task_struct *target, const struct user_regset *regset, + struct membuf to) +{ + struct fpu *fpu = &target->thread.fpu; + struct cet_user_state *cetregs; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK)) + return -ENODEV; + + sync_fpstate(fpu); + cetregs = get_xsave_addr(&fpu->fpstate->regs.xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because shadow stack was + * verified to be enabled above. This means + * MSR_IA32_U_CET.CET_SHSTK_EN should be 1 and so + * XFEATURE_CET_USER should not be in the init state. + */ + return -ENODEV; + } + + return membuf_write(&to, (unsigned long *)&cetregs->user_ssp, + sizeof(cetregs->user_ssp)); +} + +int ssp_set(struct task_struct *target, const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + struct fpu *fpu = &target->thread.fpu; + struct xregs_state *xsave = &fpu->fpstate->regs.xsave; + struct cet_user_state *cetregs; + unsigned long user_ssp; + int r; + + if (!cpu_feature_enabled(X86_FEATURE_USER_SHSTK) || + !ssp_active(target, regset)) + return -ENODEV; + + r = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &user_ssp, 0, -1); + if (r) + return r; + + /* + * Some kernel instructions (IRET, etc) can cause exceptions in the case + * of disallowed CET register values. Just prevent invalid values. + */ + if (user_ssp >= TASK_SIZE_MAX || !IS_ALIGNED(user_ssp, 8)) + return -EINVAL; + + fpu_force_restore(fpu); + + cetregs = get_xsave_addr(xsave, XFEATURE_CET_USER); + if (WARN_ON(!cetregs)) { + /* + * This shouldn't ever be NULL because shadow stack was + * verified to be enabled above. This means + * MSR_IA32_U_CET.CET_SHSTK_EN should be 1 and so + * XFEATURE_CET_USER should not be in the init state. + */ + return -ENODEV; + } + + cetregs->user_ssp = user_ssp; + return 0; +} +#endif /* CONFIG_X86_USER_SHADOW_STACK */ + #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION /* diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index dfaa270a7cc9..095f04bdabdc 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -58,6 +58,7 @@ enum x86_regset_64 { REGSET64_FP, REGSET64_IOPERM, REGSET64_XSTATE, + REGSET64_SSP, }; #define REGSET_GENERAL \ @@ -1267,6 +1268,17 @@ static struct user_regset x86_64_regsets[] __ro_after_init = { .active = ioperm_active, .regset_get = ioperm_get }, +#ifdef CONFIG_X86_USER_SHADOW_STACK + [REGSET64_SSP] = { + .core_note_type = NT_X86_SHSTK, + .n = 1, + .size = sizeof(u64), + .align = sizeof(u64), + .active = ssp_active, + .regset_get = ssp_get, + .set = ssp_set + }, +#endif }; static const struct user_regset_view user_x86_64_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index ac3da855fb19..fa1ceeae2596 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -406,6 +406,8 @@ typedef struct elf64_shdr { #define NT_386_TLS 0x200 /* i386 TLS slots (struct user_desc) */ #define NT_386_IOPERM 0x201 /* x86 io permission bitmap (1=deny) */ #define NT_X86_XSTATE 0x202 /* x86 extended state using xsave */ +/* Old binutils treats 0x203 as a CET state */ +#define NT_X86_SHSTK 0x204 /* x86 SHSTK state */ #define NT_S390_HIGH_GPRS 0x300 /* s390 upper register halves */ #define NT_S390_TIMER 0x301 /* s390 timer register */ #define NT_S390_TODCMP 0x302 /* s390 TOD clock comparator register */ From patchwork Sun Mar 19 00:15:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180167 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E9CDC7618A for ; Sun, 19 Mar 2023 00:17:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E7C64280026; Sat, 18 Mar 2023 20:17:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0642280001; Sat, 18 Mar 2023 20:17:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE2A8280026; Sat, 18 Mar 2023 20:17:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A6A77280001 for ; Sat, 18 Mar 2023 20:17:02 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 89B42A0759 for ; Sun, 19 Mar 2023 00:17:02 +0000 (UTC) X-FDA: 80583732684.13.422A002 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf01.hostedemail.com (Postfix) with ESMTP id 7FD8A4000E for ; Sun, 19 Mar 2023 00:17:00 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SsA4Rvc8; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185020; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=BYztGyXeSgKmXCeXVyD+9/C85SU+OQNRYv4uYO65s38=; b=g+oMM9Nn0gjnDk232/MH8xWNh3W3iIK1t8YGkKrrllDQCsVO71NwMPWHnODbBVIecUzm9B DgWbYVDMonJi7vbCfd35Jb3UE1lhhppyAVsR3to7cLMrB5CR/cN3CoxNjUX0iVgcN4gAbJ X9jbs/+6KmiL3c38q2ZFVRyw9rMIT54= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SsA4Rvc8; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf01.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185020; a=rsa-sha256; cv=none; b=4vKgB1mx6hm6HsYGtqOt7QgJusLCby+4g6yazcFQuULlAlTVIj1t8DMlMYYPaao9Ay3E/5 E/v5Smg7K5jzQ758nIA6YsA9iDP6B5fPmljDxUlkzwoQGZVbhlZHsyuZvYmRz5ppsGzfqC /XOsuz8MXen2QJyRyTEXAG7STX/rJ88= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185020; x=1710721020; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=7ZxUjaRiJBiGSF6h5aVdSGV+Yc7zbFIJZDgLv2rDKvs=; b=SsA4Rvc8CVmraXZ39TjzSEDuG7OXqD056TcZlQof9r8Yq+NXLhQdb6Nk N2zuVJDJ87XNamYwJ723lK3bzt6KxuK1G9l/9VWU80lw+fbjCbVN+j7o9 d5OqYI9dCI3BbNbQq4lVPD8RdwYV+wgeylmeFGsXVCWG3cJel0ysL3EcJ yoUVPeaF0urRDt6R/a8MHBGuN4Ar6GdTiPhhNOaKetM4/hai0TZNrcHsu eNru20IaWU7B7U4/GRUPRAp69pDyl/ayHmmCbZj5/rIaH7uV4FyoQAprc RNt9bvdgxbueKE9t7zfc6FHgrPK+zHutGZ/azTC8pxWl24GrNYjXFyiPw w==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491614" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491614" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749673018" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749673018" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:58 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com, Mike Rapoport Subject: [PATCH v8 39/40] x86/shstk: Add ARCH_SHSTK_UNLOCK Date: Sat, 18 Mar 2023 17:15:34 -0700 Message-Id: <20230319001535.23210-40-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7FD8A4000E X-Stat-Signature: xokmiwmshpd9adu5ncdb4bm1r7cyofht X-HE-Tag: 1679185020-942599 X-HE-Meta: U2FsdGVkX19OlDHbH1I3cckg1VeYDM5j2tY1MGSuUCdJc3y5TBFGOS1UVaqC91jULX+873s6j9aSi6orTBICgvUFrXiL2t6QhloFoB7+wXzV+mRLpt/b0naEirgxa9hT0L37Du8FYOHlAa2sL3YwnRnb1kq3nzts5BQL49nwLOIXHliXcv34iKWdcqHQWlHTRIc/VeIZpYnnKoYDQ1fHvntzcxfifSBp7kMvgWTmD4sdpZk0zCnls7WK2upOV1MeOux6PuM8YnhoTjnVDQ/TzpqKgDfEZv3TulkBjlqpc8nFEsr4BCsKcy7eH9IPSQxC4E9HkF4ONw7jlqxuPQ7HD2gTQQnCr7BsErElKtpXRI4hGYYzkqIIDp8Zxtaz169C9xjWhwWpCQO8ALXzsOBiR5y7y/UV6ZrUeFA9sQUHF66UhO+AFA38tX32MJwhQsKRDkhhk4Lt4T84w624HLu63EYhNz1t8FbH7xWtSwaz7IU33x0JqDxdNAnTsgKywx0SDXaUdwaKjYqHY7rQC//zzLR3Ns5ZslLgEZXyJsPT3XYOyMMTHn6KJeOrbl99K78wMijkar5qDGUau/LxyonAJBH4bQK343PSVs9a69tLsQsXkHFkjrHvIZ30PXKEi2OSTTBwvmEYjWzVkrcUb6/TrGTpgWYoCBjs1j4dS6yh20GTfBu5RlAvDl5tbJIhNutlHN0u5xL67CaiNZnrKfNJWpr0uS628WovwehRo7vpEYW23nOC+SrNeI5f8ddzkgA31P0DJ4uIyMr/x6RpZwJ69QwZz4Jbn+iVMficg0eymMS+CC7NMOzTkyreSJTuvrwbHYT+g2d1ONCmq1CWR+KBEj1evcfEA9QIje7lLMgk9ifx8UStOtQtbqnG0Mn/Q7PlY49un9Z3dQTlyzTZAK30mZC6suHTsn9wNt0k0ea34bNueF3TtSU4iWG2JExaxUg84kudsujLBMmvu5Hwb/d JS9JAFIb xXY5X9syvZpk4XMlxQX1BpXCfBzIvyPixR2k2R8iMHPYLkJwU+Dkatfcxd6VHkbvaTg9DU1PLsbwh6s2/GNMl1XwY0yk2PS5VzVlQvb4COauT4K137AEn5S/y3qxbwhCitCX0wDlUMpsLBAUCq2lTaU2dTitkhuD1QqOWgY+5bbLzSbWRXojGCQ6eeaB54Uej5Srt0aMwYyaeTt0oxwfDfFHzA2a/s58Ydkr76yIMVWh15kTO8YRjxna4nFKHRus+YXtcB7iySHX3e32QA5fXO7s6LaXCZj1W0zv7PX6t+Og4osPERmTAYNJPWVGI/WBA3qf/CYTcVzXB/hzvDYq4O5eg9fg25FXKeutOWqHAVSso6tU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mike Rapoport Userspace loaders may lock features before a CRIU restore operation has the chance to set them to whatever state is required by the process being restored. Allow a way for CRIU to unlock features. Add it as an arch_prctl() like the other shadow stack operations, but restrict it being called by the ptrace arch_pctl() interface. [Merged into recent API changes, added commit log and docs] Signed-off-by: Mike Rapoport Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v8: - Remove Mike's ack from his own patch (Boris) v4: - Add to docs that it is ptrace only. - Remove "CET" references v3: - Depend on CONFIG_CHECKPOINT_RESTORE (Kees) --- Documentation/x86/shstk.rst | 4 ++++ arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 9 +++++++-- 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Documentation/x86/shstk.rst b/Documentation/x86/shstk.rst index f09afa504ec0..f3553cc8c758 100644 --- a/Documentation/x86/shstk.rst +++ b/Documentation/x86/shstk.rst @@ -75,6 +75,10 @@ arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) are ignored. The mask is ORed with the existing value. So any feature bits set here cannot be enabled or disabled afterwards. +arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features) + Unlock features. 'features' is a mask of all features to unlock. All + bits set are processed, unset bits are ignored. Only works via ptrace. + The return values are as follows. On success, return 0. On error, errno can be:: diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index e31495668056..200efbbe5809 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -25,6 +25,7 @@ #define ARCH_SHSTK_ENABLE 0x5001 #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 +#define ARCH_SHSTK_UNLOCK 0x5004 /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 9bbad1763e33..69d4ccaef56f 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -835,6 +835,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_SHSTK_ENABLE: case ARCH_SHSTK_DISABLE: case ARCH_SHSTK_LOCK: + case ARCH_SHSTK_UNLOCK: return shstk_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index ee89c4206ac9..ad336ab55ace 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -459,9 +459,14 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long features) return 0; } - /* Don't allow via ptrace */ - if (task != current) + /* Only allow via ptrace */ + if (task != current) { + if (option == ARCH_SHSTK_UNLOCK && IS_ENABLED(CONFIG_CHECKPOINT_RESTORE)) { + task->thread.features_locked &= ~features; + return 0; + } return -EINVAL; + } /* Do not allow to change locked features */ if (features & task->thread.features_locked) From patchwork Sun Mar 19 00:15:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 13180168 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC23EC76196 for ; Sun, 19 Mar 2023 00:17:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C4D1280027; Sat, 18 Mar 2023 20:17:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 74E2C280001; Sat, 18 Mar 2023 20:17:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C7E7280027; Sat, 18 Mar 2023 20:17:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 45F3B280001 for ; Sat, 18 Mar 2023 20:17:04 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2C1441C6408 for ; Sun, 19 Mar 2023 00:17:04 +0000 (UTC) X-FDA: 80583732768.06.067F3CC Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf26.hostedemail.com (Postfix) with ESMTP id 2F0DF140009 for ; Sun, 19 Mar 2023 00:17:02 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Th8bpQt/"; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1679185022; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=hX1tIQjv6wLpquj+NXtr3HDoyM4MQw0DYXnBn2j4x04=; b=NNy4sqzAj46rmi+d1xwb/7VB4ml2i3aYtiJWlowUByV39CeMcwZu1X8oGdN5mGdWzySNjK 5yhmJgMmYKzJKw2mf6s5UFZqmLzVj9wv3Ef0VmVIZY0JEhtWMlrONClIcxMt8TjmiJLAuW wcRGNt7I7ezLTD6PZDmX01lDDnfEgVk= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Th8bpQt/"; spf=pass (imf26.hostedemail.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1679185022; a=rsa-sha256; cv=none; b=zNh3jNGAJ8vEaZx4JU986oidDdHI97uo83txOJt1X/koAgSs8qQAAxZA2ZqZki8I19pywv 6u3YkzN0BXnSy9bqLya3vUkkysjwTgc3mzoLFN+0MqQC3NIrQuY/1wYOxyemwPTXWf1UlT gqBuhkzOtyIzOZv/efYcYeK914aNRd4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1679185022; x=1710721022; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=NaxDC9okCEIT/JhArq/4DzJUEPU99GVMAiUn52jAzaM=; b=Th8bpQt/zHORMCzNayQLUs3HPcvbNyCa+HZqugCQh5UfylDgwu5rhD0F /U95TSYjhv7ICdJvpMe18UxzCqsiv3Z4q/kUIlduiObFgYdH5+3wWjD0l 2hHJTI8XffQFhyAlgK1imJmkB/FKorj3uvteXeR2ESJeZyr3XaW86maKm WLf+a4dG8DXseVfBePUAN0EKU4cLt01vdVAED4cavaDd0tAX9MT27EPkp YK9SJLTdIbuUGCr/voypQOGxHwCvM2wTYaDofI/H5o2L7/VLKzUbORaMM mIiJWuc2FnGys1+MqyiF4bjwENy2/+Fg4RDvXj6u6cCPnFD1rp+R75XA/ w==; X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="338491638" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="338491638" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:17:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10653"; a="749673030" X-IronPort-AV: E=Sophos;i="5.98,272,1673942400"; d="scan'208";a="749673030" Received: from bmahatwo-mobl1.gar.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.135.34.5]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Mar 2023 17:16:59 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , Weijiang Yang , "Kirill A . Shutemov" , John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com, szabolcs.nagy@arm.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v8 40/40] x86/shstk: Add ARCH_SHSTK_STATUS Date: Sat, 18 Mar 2023 17:15:35 -0700 Message-Id: <20230319001535.23210-41-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230319001535.23210-1-rick.p.edgecombe@intel.com> References: <20230319001535.23210-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2F0DF140009 X-Stat-Signature: x6zha3wgmy3ipoctfxs4tq1i98uhzscs X-Rspam-User: X-HE-Tag: 1679185022-972268 X-HE-Meta: U2FsdGVkX1+J53zbqDbjgGqhjRaKKaC9KNYnaUm+MOC/prYiwYQ18blMaIiNRHk4gq26CIbVa7yxVZYRYH6LTFkFzXyPE+tS7rD/HqI1hZI++JDDOKfBEclPPxr1RDakoAgQVx5ZR3ACooT4DR0XCZV0Tos63vPgTrLMUHVCi5WVf/nMEQihT15wq/G5I6lqeKXvaAa3C6Wh7ycz1Fv8gG4uvn2FA+yS46v9Bfhf1HiMvMjDzkfJGnsyBrGqlwzWygD/8PhZ8oooPsab+lNu3dvfLgQmFxPm5WsXDjJrCiw9fzet/sy2U7+Xdfzu9wMij47x3JoVGVaU4dB1rPEys6NPI5Jf5Bdx8zEWUXhGLvEriV7xtdhOThoZQH+eynPQY3xNb15oehKRrTxHfsoYiqlXJcD6Yvur78siOVq7x5bYZyfSxMFpbRKgJUGt4pMAgwgSTEk+N4eXyKen87QxBLhcIbgREaEixN0ocqnGyDGTtOknWOCNcsWjIyKvEM1kC4JHLaNktWo9sDuK/oZUyeoBN8u5k9Ov1zeGbh5cVhxaschFpB6zS6AJoeJc/7A2HsjKRmDCHFJWB1DEtTiNJbMHIjda8VeXSomuva3229JfYJJ+HBPTOWb4S+ExZlSpgmmvnGT65tGCFU4LxDho7TxO9msMX8rkVQXILmv+yzio98yBpY3elMYcvHAQB0CnxGL8mXGuaWPgJdvV85N4547fycTstU1dSuzY+xJVas4AIFkBMiTLhQRGXSajwBbK158CA0bOd0AHogKI+4THi8XYSfhfUB0VwTb/DZjcTbe6ih5glUo3r+2qb3BYTh2B1sJAv4D9LtAaqJIPEj/YL7Y+whswpqyqawxCk0CWR/etwVe2hulL7FTSeT6rcw+4LHo0ckFdkVkMW4T9AzltzSQdzjOtlshADOau1pSUqEp5DWv72MwN4iyoRVjAF+TZaEenyv13FkcudrqXp0W rx3vv2uh ughqVq81I9XbinTCmU1zM+mEXox+itv8G1tWpvbsQBnxXGcyQzsp97RRXf+A/mk6GUaFhnSwXtBfvFLhuJ8gyLh8Beojp6y1UDaOmlful3SIHeCgAoSJLgsyxOd0bbu26965UaEZorTZV8YCgpzVFarV3f/+hYtGGtfgSBuTgSEPBgy59nlRByftCgpHNnPdukPuhKVQqwDV8HZvIvx3l5xy1XYamuPyEaBoaV6RYERUuI0ime6cAyb4y21OSlclYTtaR1JaUidii5g9sCC7LQvRkpIDN+DgtnmSBktq+4RNbVnFMJLZDfsTRW6X8Y2vlwlo4mfzQhr/7VORbwcIua9TbqKIr5cZDS9Zkcq6SZw+hBDhrw+UpS3DYjZ/muYCniYFV2J/VRuCAap/wRSD96rsi+w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: CRIU and GDB need to get the current shadow stack and WRSS enablement status. This information is already available via /proc/pid/status, but this is inconvenient for CRIU because it involves parsing the text output in an area of the code where this is difficult. Provide a status arch_prctl(), ARCH_SHSTK_STATUS for retrieving the status. Have arg2 be a userspace address, and make the new arch_prctl simply copy the features out to userspace. Suggested-by: Mike Rapoport Signed-off-by: Rick Edgecombe Reviewed-by: Kees Cook Acked-by: Mike Rapoport (IBM) Tested-by: Pengfei Xu Tested-by: John Allen Tested-by: Kees Cook --- v5: - Fix typo in commit log v4: - New patch --- Documentation/x86/shstk.rst | 6 ++++++ arch/x86/include/asm/shstk.h | 2 +- arch/x86/include/uapi/asm/prctl.h | 1 + arch/x86/kernel/process_64.c | 1 + arch/x86/kernel/shstk.c | 8 +++++++- 5 files changed, 16 insertions(+), 2 deletions(-) diff --git a/Documentation/x86/shstk.rst b/Documentation/x86/shstk.rst index f3553cc8c758..60260e809baf 100644 --- a/Documentation/x86/shstk.rst +++ b/Documentation/x86/shstk.rst @@ -79,6 +79,11 @@ arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features) Unlock features. 'features' is a mask of all features to unlock. All bits set are processed, unset bits are ignored. Only works via ptrace. +arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr) + Copy the currently enabled features to the address passed in addr. The + features are described using the bits passed into the others in + 'features'. + The return values are as follows. On success, return 0. On error, errno can be:: @@ -86,6 +91,7 @@ be:: -ENOTSUPP if the feature is not supported by the hardware or kernel. -EINVAL arguments (non existing feature, etc) + -EFAULT if could not copy information back to userspace The feature's bits supported are:: diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h index ecb23a8ca47d..42fee8959df7 100644 --- a/arch/x86/include/asm/shstk.h +++ b/arch/x86/include/asm/shstk.h @@ -14,7 +14,7 @@ struct thread_shstk { u64 size; }; -long shstk_prctl(struct task_struct *task, int option, unsigned long features); +long shstk_prctl(struct task_struct *task, int option, unsigned long arg2); void reset_thread_features(void); unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, unsigned long stack_size); diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 200efbbe5809..1b85bc876c2d 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -26,6 +26,7 @@ #define ARCH_SHSTK_DISABLE 0x5002 #define ARCH_SHSTK_LOCK 0x5003 #define ARCH_SHSTK_UNLOCK 0x5004 +#define ARCH_SHSTK_STATUS 0x5005 /* ARCH_SHSTK_ features bits */ #define ARCH_SHSTK_SHSTK (1ULL << 0) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 69d4ccaef56f..31241930b60c 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -836,6 +836,7 @@ long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) case ARCH_SHSTK_DISABLE: case ARCH_SHSTK_LOCK: case ARCH_SHSTK_UNLOCK: + case ARCH_SHSTK_STATUS: return shstk_prctl(task, option, arg2); default: ret = -EINVAL; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index ad336ab55ace..1f767c509ee9 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -452,8 +452,14 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi return alloc_shstk(addr, aligned_size, size, set_tok); } -long shstk_prctl(struct task_struct *task, int option, unsigned long features) +long shstk_prctl(struct task_struct *task, int option, unsigned long arg2) { + unsigned long features = arg2; + + if (option == ARCH_SHSTK_STATUS) { + return put_user(task->thread.features, (unsigned long __user *)arg2); + } + if (option == ARCH_SHSTK_LOCK) { task->thread.features_locked |= features; return 0;