From patchwork Sun Jan 30 21:18:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51C82C4167B for ; Sun, 30 Jan 2022 21:21:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ACAA26B0073; Sun, 30 Jan 2022 16:21:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A2F7B6B0078; Sun, 30 Jan 2022 16:21:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 722916B0074; Sun, 30 Jan 2022 16:21:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id 5AE336B0073 for ; Sun, 30 Jan 2022 16:21:50 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2279E181DF763 for ; Sun, 30 Jan 2022 21:21:50 +0000 (UTC) X-FDA: 79088225580.14.343A4A7 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf15.hostedemail.com (Postfix) with ESMTP id 6F977A0003 for ; Sun, 30 Jan 2022 21:21:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577709; x=1675113709; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Sl3QurZ5nHWYFT7c2lQcYAYIIsbunX9GgpPAPWDc6Q8=; b=ZRn5+pihSL7Ja0NiatOMr/H1pJRE0Tm1i9qUKz4DRt8WjHr7bE5TH7/4 E5VkcJuKiovYv/FDb+SumoXdMpxD++jr2kEXuv4Txx/ic/3i9fLYPyJ2o i8IHtPLN1bpBz0dU1xj8vx4if4mLe2OBgVQrfsy6NJrCxUaBZUt+tktk2 SwIoeDsriLoAvAuRe3UDugQzeOAx4VZFmb7CRcyN8fJ8Kz0jbz9VpxQL6 OQ08SmTMER3Mv4srpRXN8M+2G8lCCmpebCtS8DJ6YoVfJ7XJWZzxpxZ/r EdBC9n9krzxFXmTliXyxPB4gHP2UVxJtfrWYb9zuxHjUYIujo6o7ADHsf A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970175" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970175" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:47 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856649" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:46 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 01/35] Documentation/x86: Add CET description Date: Sun, 30 Jan 2022 13:18:04 -0800 Message-Id: <20220130211838.8382-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6F977A0003 X-Stat-Signature: 98a3ni5fiqj7rehimzk3pmpj1famuoea Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=ZRn5+pih; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf15.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-HE-Tag: 1643577709-299930 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Introduce a new document on Control-flow Enforcement Technology (CET). Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v1: - Update and clarify the docs. - Moved kernel parameters documentation to other patch. Documentation/x86/cet.rst | 145 ++++++++++++++++++++++++++++++++++++ Documentation/x86/index.rst | 1 + 2 files changed, 146 insertions(+) create mode 100644 Documentation/x86/cet.rst diff --git a/Documentation/x86/cet.rst b/Documentation/x86/cet.rst new file mode 100644 index 000000000000..ff0f9a148959 --- /dev/null +++ b/Documentation/x86/cet.rst @@ -0,0 +1,145 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================= +Control-flow Enforcement Technology (CET) +========================================= + +[1] Overview +============ + +Control-flow Enforcement Technology (CET) is term referring to several +related x86 processor features that provides protection against control +flow hijacking attacks. The HW feature itself can be set up to protect +both applications and the kernel. Only user-mode protection is implemented +in the 64-bit kernel, including shadow stack support for running legacy +32-bit applications. + +CET introduces Shadow Stack and Indirect Branch Tracking. Shadow stack is +a secondary stack allocated from memory and cannot be directly modified by +applications. When executing a CALL instruction, the processor pushes the +return address to both the normal stack and the shadow stack. Upon +function return, the processor pops the shadow stack copy and compares it +to the normal stack copy. If the two differ, the processor raises a +control-protection fault. Indirect branch tracking verifies indirect +CALL/JMP targets are intended as marked by the compiler with 'ENDBR' +opcodes. Not all CPU's have both Shadow Stack and Indirect Branch Tracking +and only Shadow Stack is currently supported in the kernel. + +The Kconfig options is X86_SHADOW_STACK, and it can be disabled with +no_user_shstk. + +To build a CET-enabled kernel, Binutils v2.31 and GCC v8.1 or LLVM v10.0.1 +or later are required. To build a CET-enabled application, GLIBC v2.28 or +later is also required. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. + +[2] Application Enabling +======================== + +An application's CET capability is marked in its ELF header and can be +verified from readelf/llvm-readelf output: + + readelf -n | grep -a SHSTK + properties: x86 feature: SHSTK + +The kernel does not process these applications directly. Applications must +enable them using the interface descriped in section 4. Typically this +would be done in dynamic loader or static runtime objects, as is the case +in glibc. + +[3] Backward Compatibility +========================== + +GLIBC provides a few CET tunables via the GLIBC_TUNABLES environment +variable: + +GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-WRSS + Turn off SHSTK/WRSS. + +GLIBC_TUNABLES=glibc.tune.x86_shstk= + This controls how dlopen() handles SHSTK legacy libraries:: + + on - continue with SHSTK enabled; + permissive - continue with SHSTK off. + +Details can be found in the GLIBC manual pages. + +[4] CET arch_prctl()'s +====================== + +Elf features are enabled using the below arch_prctl's. + +arch_prctl(ARCH_X86_FEATURE_STATUS, u64 *args) + Get feature status. + + The parameter 'args' is a pointer to a user buffer. The kernel returns + the following information: + + *args = shadow stack/IBT status + *(args + 1) = shadow stack base address + *(args + 2) = shadow stack size + + 32-bit binaries use the same interface, but only lower 32-bits of each + item. + +arch_prctl(ARCH_X86_FEATURE_DISABLE, unsigned int features) + Disable features specified in 'features'. Return -EPERM if any of the + passed feature are locked. Return -ECANCELED if any of the features + failed to disable. In this case call ARCH_X86_FEATURE_STATUS to find + out which features are still enabled. + +arch_prctl(ARCH_X86_FEATURE_ENABLE, unsigned int features) + Enable feature specified in 'features'. Return -EPERM if any of the + passed feature are locked. Return -ECANCELED if any of the features + failed to enable. In this case call ARCH_X86_FEATURE_STATUS to find + out which features were enabled. + +arch_prctl(ARCH_X86_FEATURE_LOCK, unsigned int features) + Lock in all features at their current enabled or disabled status. + + +Currently shadow stack and WRSS are supported via this interface. WRSS +can only be enabled with shadow stack, and is automatically disabled +if shadow stack is disabled. + +[5] The implementation of the Shadow Stack +========================================== + +Shadow Stack size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. However, +a compat-mode application's address space is smaller, each of its thread's +shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB). + +Signal +------ + +The main program and its signal handlers use the same shadow stack. +Because the shadow stack stores only return addresses, a large shadow +stack covers the condition that both the program stack and the signal +alternate stack run out. + +The kernel creates a restore token for the shadow stack and pushes the +restorer address to the shadow stack. Then verifies that token when +restoring from the signal handler. + +Fork +---- + +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index f498f1d36cd3..b5f083a61eab 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -21,6 +21,7 @@ x86-specific Documentation tlb mtrr pat + cet intel-iommu intel_txt amd-memory-encryption From patchwork Sun Jan 30 21:18:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C132BC433EF for ; Sun, 30 Jan 2022 21:21:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8AD516B0074; Sun, 30 Jan 2022 16:21:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 72C036B007B; Sun, 30 Jan 2022 16:21:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59B446B007D; Sun, 30 Jan 2022 16:21:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id 3B9B56B0078 for ; Sun, 30 Jan 2022 16:21:51 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E41E795AF8 for ; Sun, 30 Jan 2022 21:21:50 +0000 (UTC) X-FDA: 79088225580.05.77E66D8 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf29.hostedemail.com (Postfix) with ESMTP id 37592120005 for ; Sun, 30 Jan 2022 21:21:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577710; x=1675113710; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=yVxcklgB+ujIk3vYuj3CfwXGAGbbQ9OQA7BkA48dRkw=; b=Clm277iK9UuSM/gHWho94DdeeiKb2RQQF0OK5iRveBlS7RMM7mybzUtE Sl9ZwqicUj9AtHKLrkOY6+fwcobm9fJLrfndJjdqKds3mpYuSW8S+prbW 7fHiLEsc77AvN7wPGm7dXgCVxLLaxS2frf8DuTqQDnTBNCqx2W6lVNvrg uY/5MBF5sVHANezTUO6ZRSrgh2XsuOQgV69kgWjHQTcf4fUyYArKBBiUa 2ax6OacIsW/2jJZp+M3g7bmKYeA/iff+LlgJZjLUp5bKXZ+oI/oludTIx 4s1TheLGaDvZ4gTvQp0UdHsF4L+c2+uSPmzquPFnCSgBQbI5vMz8A/3Pb Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970179" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970179" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:48 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856661" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:47 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 02/35] x86/cet/shstk: Add Kconfig option for Shadow Stack Date: Sun, 30 Jan 2022 13:18:05 -0800 Message-Id: <20220130211838.8382-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 37592120005 X-Stat-Signature: 1xhmawssjqq9bjpe1wmysj5k7qzzcuap Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Clm277iK; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf29.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspam-User: nil X-HE-Tag: 1643577710-569134 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Shadow Stack provides protection against function return address corruption. It is active when the processor supports it, the kernel has CONFIG_X86_SHADOW_STACK enabled, and the application is built for the feature. This is only implemented for the 64-bit kernel. When it is enabled, legacy non-Shadow Stack applications continue to work, but without protection. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Cc: Kees Cook --- Yu-cheng v25: - Remove X86_CET and use X86_SHADOW_STACK directly. Yu-cheng v24: - Update for the splitting X86_CET to X86_SHADOW_STACK and X86_IBT. arch/x86/Kconfig | 22 ++++++++++++++++++++++ arch/x86/Kconfig.assembler | 5 +++++ 2 files changed, 27 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ebe8fc76949a..b9efa0fd906d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -26,6 +26,7 @@ config X86_64 depends on 64BIT # Options that are inherently 64-bit kernel only: select ARCH_HAS_GIGANTIC_PAGE + select ARCH_HAS_SHADOW_STACK select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_USE_CMPXCHG_LOCKREF select HAVE_ARCH_SOFT_DIRTY @@ -1940,6 +1941,27 @@ config X86_SGX If unsure, say N. +config ARCH_HAS_SHADOW_STACK + def_bool n + +config X86_SHADOW_STACK + prompt "Intel Shadow Stack" + def_bool n + depends on AS_WRUSS + depends on ARCH_HAS_SHADOW_STACK + select ARCH_USES_HIGH_VMA_FLAGS + help + Shadow Stack protection is a hardware feature that detects function + return address corruption. This helps mitigate ROP attacks. + Applications must be enabled to use it, and old userspace does not + get protection "for free". + Support for this feature is present on Tiger Lake family of + processors released in 2020 or later. Enabling this feature + increases kernel text size by 3.7 KB. + See Documentation/x86/intel_cet.rst for more information. + + If unsure, say N. + config EFI bool "EFI runtime service support" depends on ACPI diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler index 26b8c08e2fc4..00c79dd93651 100644 --- a/arch/x86/Kconfig.assembler +++ b/arch/x86/Kconfig.assembler @@ -19,3 +19,8 @@ config AS_TPAUSE def_bool $(as-instr,tpause %ecx) help Supported by binutils >= 2.31.1 and LLVM integrated assembler >= V7 + +config AS_WRUSS + def_bool $(as-instr,wrussq %rax$(comma)(%rbx)) + help + Supported by binutils >= 2.31 and LLVM integrated assembler From patchwork Sun Jan 30 21:18:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730132 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17795C4167E for ; Sun, 30 Jan 2022 21:21:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 63A366B0075; Sun, 30 Jan 2022 16:21:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E9966B0074; Sun, 30 Jan 2022 16:21:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3FBDC6B0074; Sun, 30 Jan 2022 16:21:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0239.hostedemail.com [216.40.44.239]) by kanga.kvack.org (Postfix) with ESMTP id 2BFB36B0074 for ; Sun, 30 Jan 2022 16:21:51 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DB708181F5FF2 for ; Sun, 30 Jan 2022 21:21:50 +0000 (UTC) X-FDA: 79088225580.09.D096F8D Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf15.hostedemail.com (Postfix) with ESMTP id 4CABAA0002 for ; Sun, 30 Jan 2022 21:21:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577710; x=1675113710; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=GmgTdNk6+MhoR9z6kCfchMrEnFl9sg5ugfj0SfBqJ+w=; b=Ddp1gOp71bm4mJhqwx6HsABtMq3alI6sxexDGMW1CW2Pm+/fdPpjF2yT EVIo0ecRn3hMAr3F7ykEWq0LC7dMxABbdRsanbJquAxhczNAKmmezHgxf n8rCC1cfaQn+GQi9IHw2hZYjRBYJDYCjVtqN57/alGlczLnRpUbuzNbl/ RKRF7grQIgfbn2L3ExSHJFXSDiVXoADau9tVKiH4wdllANz3cDabBi/Y6 b3aoSmYoPYMXCQtc9A7V288rjcPH4ob2OIcAuCHMy/eseLlL0cwicSVVf SgR0Hg8ibHmpbx/i1FK+HyRTQUMZbYT7O6mKk7dEEsAG+4TEMv1/7VgiC g==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970181" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970181" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:48 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856671" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:48 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 03/35] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET) Date: Sun, 30 Jan 2022 13:18:06 -0800 Message-Id: <20220130211838.8382-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4CABAA0002 X-Stat-Signature: oquak3fxr897shksiukq6zmq46gsedug Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ddp1gOp7; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf15.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-HE-Tag: 1643577710-601695 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Add CPU feature flags for Control-flow Enforcement Technology (CET). CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect Branch Tracking Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v1: - Remove IBT, can be added in a follow on IBT series. Yu-cheng v25: - Make X86_FEATURE_IBT depend on X86_FEATURE_SHSTK. Yu-cheng v24: - Update for splitting CONFIG_X86_CET to CONFIG_X86_SHADOW_STACK and CONFIG_X86_IBT. - Move DISABLE_IBT definition to the IBT series. arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/cpuid-deps.c | 1 + 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 6db4e2932b3d..c3eb94b13fef 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -355,6 +355,7 @@ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_SHSTK (16*32+ 7) /* Shadow Stack */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ #define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 8f28fafa98b3..b7728f7afb2b 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -65,6 +65,12 @@ # define DISABLE_SGX (1 << (X86_FEATURE_SGX & 31)) #endif +#ifdef CONFIG_X86_SHADOW_STACK +#define DISABLE_SHSTK 0 +#else +#define DISABLE_SHSTK (1 << (X86_FEATURE_SHSTK & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -85,7 +91,7 @@ #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ - DISABLE_ENQCMD) + DISABLE_ENQCMD|DISABLE_SHSTK) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK19 0 diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index c881bcafba7d..bf1b55a1ba21 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -78,6 +78,7 @@ static const struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_XFD, X86_FEATURE_XSAVES }, { X86_FEATURE_XFD, X86_FEATURE_XGETBV1 }, { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD }, + { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES }, {} }; From patchwork Sun Jan 30 21:18:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730134 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D7ADC433FE for ; Sun, 30 Jan 2022 21:21:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F0276B0078; Sun, 30 Jan 2022 16:21:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 353AC6B007B; Sun, 30 Jan 2022 16:21:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1079C6B007D; Sun, 30 Jan 2022 16:21:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0161.hostedemail.com [216.40.44.161]) by kanga.kvack.org (Postfix) with ESMTP id EF9226B0078 for ; Sun, 30 Jan 2022 16:21:51 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B35138249980 for ; Sun, 30 Jan 2022 21:21:51 +0000 (UTC) X-FDA: 79088225622.08.E967221 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf15.hostedemail.com (Postfix) with ESMTP id 3BD82A0003 for ; Sun, 30 Jan 2022 21:21:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577711; x=1675113711; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=+CDlU4865vPjzgPm2GtX1jT2wdU+4Zo20LaMpttAL8o=; b=k67XbT09CLEshMQ2yNNK0nUpRHPXA5Fk8hWkCPUwNqlHLF2kPr09fe3r 3JdqNZlsiZG6Imhs3oR+1D+Ea+tvk6tRTJh4F+lKkomXyRr1w4tZ6oDAM Ufh8jUjy/6Yfyp7R/ZGwxfJT/cf1ukBoB2Ra6avUu9IyZfqTww074DCV5 gLJFUyLlPMdRSqYVR8aoxBtcWMqZbbCzkQEn/9dXLFQ7mxtthxNcRn4ha p5JgG5o2fiJuhKYSdyv6SIK3rI7zmbsfQdnt3OgkbuMC6wtEd2Xpx7YQ8 ny2dPEFp3gNxTE6f121dDXnt0TRRAyls8TOonIWPTy35BwMR8JwQkRczZ A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970184" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970184" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:49 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856682" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:48 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 04/35] x86/cpufeatures: Introduce CPU setup and option parsing for CET Date: Sun, 30 Jan 2022 13:18:07 -0800 Message-Id: <20220130211838.8382-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 3BD82A0003 X-Stat-Signature: iq8h9xrch1m6kcr6x7y6sym453hmyjhk Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=k67XbT09; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf15.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-HE-Tag: 1643577710-959090 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Introduce CPU setup and boot option parsing for CET features. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v1: - Moved kernel-parameters.txt changes here from patch 1. Yu-cheng v25: - Remove software-defined X86_FEATURE_CET. Yu-cheng v24: - Update #ifdef placement to reflect Kconfig changes of splitting shadow stack and ibt. Documentation/admin-guide/kernel-parameters.txt | 4 ++++ arch/x86/include/uapi/asm/processor-flags.h | 2 ++ arch/x86/kernel/cpu/common.c | 12 ++++++++++++ 3 files changed, 18 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index f5a27f067db9..6c5456c56dbf 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3389,6 +3389,10 @@ noexec=on: enable non-executable mappings (default) noexec=off: disable non-executable mappings + no_user_shstk [X86-64] Disable Shadow Stack for user-mode + applications. Disabling shadow stack also disables + IBT. + nosmap [X86,PPC] Disable SMAP (Supervisor Mode Access Prevention) even if it is supported by processor. diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index bcba3c643e63..a8df907e8017 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -130,6 +130,8 @@ #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT) #define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */ #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT) +#define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement */ +#define X86_CR4_CET _BITUL(X86_CR4_CET_BIT) /* * x86-64 Task Priority Register, CR8 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 7b8382c11788..9ee339f5b8ca 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -515,6 +515,14 @@ static __init int setup_disable_pku(char *arg) __setup("nopku", setup_disable_pku); #endif /* CONFIG_X86_64 */ +static __always_inline void setup_cet(struct cpuinfo_x86 *c) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return; + + cr4_set_bits(X86_CR4_CET); +} + /* * Some CPU features depend on higher CPUID levels, which may not always * be available due to CPUID level capping or broken virtualization @@ -1261,6 +1269,9 @@ static void __init cpu_parse_early_param(void) if (cmdline_find_option_bool(boot_command_line, "noxsaves")) setup_clear_cpu_cap(X86_FEATURE_XSAVES); + if (cmdline_find_option_bool(boot_command_line, "no_user_shstk")) + setup_clear_cpu_cap(X86_FEATURE_SHSTK); + arglen = cmdline_find_option(boot_command_line, "clearcpuid", arg, sizeof(arg)); if (arglen <= 0) return; @@ -1632,6 +1643,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) x86_init_rdrand(c); setup_pku(c); + setup_cet(c); /* * Clear/Set all flags overridden by options, need do it From patchwork Sun Jan 30 21:18:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7599BC4167D for ; Sun, 30 Jan 2022 21:21:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96E486B007B; Sun, 30 Jan 2022 16:21:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F8DC6B007D; Sun, 30 Jan 2022 16:21:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74A446B007E; Sun, 30 Jan 2022 16:21:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id 52A776B007D for ; Sun, 30 Jan 2022 16:21:52 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0DFCA181E7875 for ; Sun, 30 Jan 2022 21:21:52 +0000 (UTC) X-FDA: 79088225664.25.B569E57 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf29.hostedemail.com (Postfix) with ESMTP id 68265120002 for ; Sun, 30 Jan 2022 21:21:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577711; x=1675113711; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=JIDO+hhm7k5KVi5VARu9MhWDgicinUNVViJnyeo2Y6Q=; b=m/F2bNDR+f0h1r5AcPXaR+IaoOMyKQVmei1Qo5T4GZPl21lR0V6rxvgA wJbP50WxPSl0PBzNdQFW1M7GoUN6d9TTc9EkxsjnXQ1XKWINTQiw2hi4y WmFNJPwXfgCxeJSYzwe8HSoCxD3PdIuwqkF6n9rUjnNtsaz1rRbqk8TrC IXq/V6mHyH2JOXnSL/tva8fTI3SgJ/zKN7th1BvKl1JEFjWwKA+nyKYwI SG3eOZY3eV+he8tF4vqxrc8ny+KNxi8idSVQ24gyWgSMo3l8u0mLcm8q7 eCPqDhRRxI+Jr5AmuVF/Xw9+0bqwsRpNt6+v0G8WOvTCLharR4POTd+1R A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970187" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970187" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:50 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856690" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:49 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 05/35] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states Date: Sun, 30 Jan 2022 13:18:08 -0800 Message-Id: <20220130211838.8382-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 68265120002 X-Stat-Signature: f39t13m5cxddnmsnou9cef6k37bko6w4 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="m/F2bNDR"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf29.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspam-User: nil X-HE-Tag: 1643577711-727323 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Control-flow Enforcement Technology (CET) introduces these MSRs: MSR_IA32_U_CET (user-mode CET settings), MSR_IA32_PL3_SSP (user-mode shadow stack pointer), MSR_IA32_PL0_SSP (kernel-mode shadow stack pointer), MSR_IA32_PL1_SSP (Privilege Level 1 shadow stack pointer), MSR_IA32_PL2_SSP (Privilege Level 2 shadow stack pointer), MSR_IA32_S_CET (kernel-mode CET settings), MSR_IA32_INT_SSP_TAB (exception shadow stack table). The two user-mode MSRs belong to XFEATURE_CET_USER. The first three of kernel-mode MSRs belong to XFEATURE_CET_KERNEL. Both XSAVES states are supervisor states. This means that there is no direct, unprivileged access to these states, making it harder for an attacker to subvert CET. For future ptrace() support, shadow stack address and MSR reserved bits are checked before written to the supervisor states. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v1: - Remove outdated reference to sigreturn checks on msr's. Yu-cheng v29: - Move CET MSR definition up in msr-index.h. Yu-cheng v28: - Add XFEATURE_MASK_CET_USER to XFEATURES_INIT_FPSTATE_HANDLED. Yu-cheng v25: - Update xsave_cpuid_features[]. Now CET XSAVES features depend on X86_FEATURE_SHSTK (vs. the software-defined X86_FEATURE_CET). arch/x86/include/asm/fpu/types.h | 23 +++++++++++++++++++++-- arch/x86/include/asm/fpu/xstate.h | 6 ++++-- arch/x86/include/asm/msr-index.h | 20 ++++++++++++++++++++ arch/x86/kernel/fpu/xstate.c | 13 ++++++++++++- 4 files changed, 57 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index eb7cd1139d97..e2b21197661c 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -115,8 +115,8 @@ enum xfeature { XFEATURE_PT_UNIMPLEMENTED_SO_FAR, XFEATURE_PKRU, XFEATURE_PASID, - XFEATURE_RSRVD_COMP_11, - XFEATURE_RSRVD_COMP_12, + XFEATURE_CET_USER, + XFEATURE_CET_KERNEL, XFEATURE_RSRVD_COMP_13, XFEATURE_RSRVD_COMP_14, XFEATURE_LBR, @@ -138,6 +138,8 @@ enum xfeature { #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR) #define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU) #define XFEATURE_MASK_PASID (1 << XFEATURE_PASID) +#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER) +#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL) #define XFEATURE_MASK_LBR (1 << XFEATURE_LBR) #define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG) #define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA) @@ -252,6 +254,23 @@ struct pkru_state { u32 pad; } __packed; +/* + * State component 11 is Control-flow Enforcement user states + */ +struct cet_user_state { + u64 user_cet; /* user control-flow settings */ + u64 user_ssp; /* user shadow stack pointer */ +}; + +/* + * State component 12 is Control-flow Enforcement kernel states + */ +struct cet_kernel_state { + u64 kernel_ssp; /* kernel shadow stack */ + u64 pl1_ssp; /* privilege level 1 shadow stack */ + u64 pl2_ssp; /* privilege level 2 shadow stack */ +}; + /* * State component 15: Architectural LBR configuration state. * The size of Arch LBR state depends on the number of LBRs (lbr_depth). diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h index cd3dd170e23a..d4427b88ee12 100644 --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -50,7 +50,8 @@ #define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA /* All currently supported supervisor features */ -#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID) +#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER) /* * A supervisor state component may not always contain valuable information, @@ -77,7 +78,8 @@ * Unsupported supervisor features. When a supervisor feature in this mask is * supported in the future, move it to the supported supervisor feature mask. */ -#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT) +#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \ + XFEATURE_MASK_CET_KERNEL) /* All supervisor states including supported and unsupported states. */ #define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 3faf0f97edb1..0ee77ce4c753 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -362,6 +362,26 @@ #define MSR_CORE_PERF_LIMIT_REASONS 0x00000690 + +/* Control-flow Enforcement Technology MSRs */ +#define MSR_IA32_U_CET 0x000006a0 /* user mode cet setting */ +#define MSR_IA32_S_CET 0x000006a2 /* kernel mode cet setting */ +#define CET_SHSTK_EN BIT_ULL(0) +#define CET_WRSS_EN BIT_ULL(1) +#define CET_ENDBR_EN BIT_ULL(2) +#define CET_LEG_IW_EN BIT_ULL(3) +#define CET_NO_TRACK_EN BIT_ULL(4) +#define CET_SUPPRESS_DISABLE BIT_ULL(5) +#define CET_RESERVED (BIT_ULL(6) | BIT_ULL(7) | BIT_ULL(8) | BIT_ULL(9)) +#define CET_SUPPRESS BIT_ULL(10) +#define CET_WAIT_ENDBR BIT_ULL(11) + +#define MSR_IA32_PL0_SSP 0x000006a4 /* kernel shadow stack pointer */ +#define MSR_IA32_PL1_SSP 0x000006a5 /* ring-1 shadow stack pointer */ +#define MSR_IA32_PL2_SSP 0x000006a6 /* ring-2 shadow stack pointer */ +#define MSR_IA32_PL3_SSP 0x000006a7 /* user shadow stack pointer */ +#define MSR_IA32_INT_SSP_TAB 0x000006a8 /* exception shadow stack table */ + #define MSR_GFX_PERF_LIMIT_REASONS 0x000006B0 #define MSR_RING_PERF_LIMIT_REASONS 0x000006B1 diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 02b3ddaf4f75..44397202762b 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -50,6 +50,8 @@ static const char *xfeature_names[] = "Processor Trace (unused)" , "Protection Keys User registers", "PASID state", + "Control-flow User registers" , + "Control-flow Kernel registers" , "unknown xstate feature" , "unknown xstate feature" , "unknown xstate feature" , @@ -73,6 +75,8 @@ static unsigned short xsave_cpuid_features[] __initdata = { [XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT, [XFEATURE_PKRU] = X86_FEATURE_PKU, [XFEATURE_PASID] = X86_FEATURE_ENQCMD, + [XFEATURE_CET_USER] = X86_FEATURE_SHSTK, + [XFEATURE_CET_KERNEL] = X86_FEATURE_SHSTK, [XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE, [XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE, }; @@ -250,6 +254,8 @@ static void __init print_xstate_features(void) print_xstate_feature(XFEATURE_MASK_Hi16_ZMM); print_xstate_feature(XFEATURE_MASK_PKRU); print_xstate_feature(XFEATURE_MASK_PASID); + print_xstate_feature(XFEATURE_MASK_CET_USER); + print_xstate_feature(XFEATURE_MASK_CET_KERNEL); print_xstate_feature(XFEATURE_MASK_XTILE_CFG); print_xstate_feature(XFEATURE_MASK_XTILE_DATA); } @@ -405,6 +411,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate) XFEATURE_MASK_BNDREGS | \ XFEATURE_MASK_BNDCSR | \ XFEATURE_MASK_PASID | \ + XFEATURE_MASK_CET_USER | \ XFEATURE_MASK_XTILE) /* @@ -621,6 +628,8 @@ static bool __init check_xstate_against_struct(int nr) XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state); XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state); XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg); + XCHECK_SZ(sz, nr, XFEATURE_CET_USER, struct cet_user_state); + XCHECK_SZ(sz, nr, XFEATURE_CET_KERNEL, struct cet_kernel_state); /* The tile data size varies between implementations. */ if (nr == XFEATURE_XTILE_DATA) @@ -634,7 +643,9 @@ static bool __init check_xstate_against_struct(int nr) if ((nr < XFEATURE_YMM) || (nr >= XFEATURE_MAX) || (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) || - ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) { + (nr == XFEATURE_RSRVD_COMP_13) || + (nr == XFEATURE_RSRVD_COMP_14) || + (nr == XFEATURE_RSRVD_COMP_16)) { WARN_ONCE(1, "no structure for xstate: %d\n", nr); XSTATE_WARN_ON(1); return false; From patchwork Sun Jan 30 21:18:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730136 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61B1BC41535 for ; Sun, 30 Jan 2022 21:22:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E41B6B007E; Sun, 30 Jan 2022 16:21:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 749746B007D; Sun, 30 Jan 2022 16:21:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48AB46B0081; Sun, 30 Jan 2022 16:21:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id 245FB6B007D for ; Sun, 30 Jan 2022 16:21:53 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D14F091ACD for ; Sun, 30 Jan 2022 21:21:52 +0000 (UTC) X-FDA: 79088225664.24.F667FB2 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf15.hostedemail.com (Postfix) with ESMTP id 26A70A0007 for ; Sun, 30 Jan 2022 21:21:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577712; x=1675113712; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Nk1tOIdLVz+bQkgFftn6rPnhzAu43sdXIuQHcD9KXX4=; b=XWm5e+2y/nRddLwyb5Eath+4nByh50c/QSUeP23Iwg4kafehC3Kr8qtS F7ZgtdEAv3ldyPPMsF2mrHKcQeYzxoFLZlR80lGu1sRSwAteE0ei0ZnRK 6+do1Q3AKocGgdgqnw9lIyyrv7rRImBPUWxj50216y7Zn30H2XaQJokxM q3Z622BG+Jld1TYOxs7zkcS+/zpYgN/WTJUugsU4j4/g3VMkO437T3zss rCPP/DxNAnix67S6lV8Vv0Fc2wHEDFgT2+M9mPjmh3Pe/J9S/6A1bR9eO O4bgvj1kTH1AYraDcikdHqB9mECGmlwQaD7/XvPTxmrSozbiFjBTlcVIE w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970189" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970189" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:51 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856701" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:50 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Michael Kerrisk Subject: [PATCH 06/35] x86/cet: Add control-protection fault handler Date: Sun, 30 Jan 2022 13:18:09 -0800 Message-Id: <20220130211838.8382-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 26A70A0007 X-Stat-Signature: qmga8b9a6kqftxuz3bepppufzu6t6fzp Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=XWm5e+2y; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf15.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-HE-Tag: 1643577711-37249 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the shadow stack; or an indirect JMP instruction, without the NOTRACK prefix, arrives at a non-ENDBR opcode. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Cc: Kees Cook Cc: Michael Kerrisk --- v1: - Update static asserts for NSIGSEGV Yu-cheng v29: - Remove pr_emerg() since it is followed by die(). - Change boot_cpu_has() to cpu_feature_enabled(). Yu-cheng v25: - Change CONFIG_X86_CET to CONFIG_X86_SHADOW_STACK. - Change X86_FEATURE_CET to X86_FEATURE_SHSTK. arch/arm/kernel/signal.c | 2 +- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/signal32.c | 2 +- arch/sparc/kernel/signal32.c | 2 +- arch/sparc/kernel/signal_64.c | 2 +- arch/x86/include/asm/idtentry.h | 4 ++ arch/x86/kernel/idt.c | 4 ++ arch/x86/kernel/signal_compat.c | 2 +- arch/x86/kernel/traps.c | 62 ++++++++++++++++++++++++++++++ include/uapi/asm-generic/siginfo.h | 3 +- 10 files changed, 78 insertions(+), 7 deletions(-) diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c index c532a6041066..59aaadce9d52 100644 --- a/arch/arm/kernel/signal.c +++ b/arch/arm/kernel/signal.c @@ -681,7 +681,7 @@ asmlinkage void do_rseq_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index d8aaf4b6f432..d2da57c415b8 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -983,7 +983,7 @@ void __init minsigstksz_setup(void) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index d984282b979f..8776a34c6444 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -460,7 +460,7 @@ void compat_setup_restart_syscall(struct pt_regs *regs) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal32.c b/arch/sparc/kernel/signal32.c index 6cc124a3bb98..dc50b2a78692 100644 --- a/arch/sparc/kernel/signal32.c +++ b/arch/sparc/kernel/signal32.c @@ -752,7 +752,7 @@ asmlinkage int do_sys32_sigstack(u32 u_ssptr, u32 u_ossptr, unsigned long sp) */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/sparc/kernel/signal_64.c b/arch/sparc/kernel/signal_64.c index 2a78d2af1265..7fe2bd37bd1a 100644 --- a/arch/sparc/kernel/signal_64.c +++ b/arch/sparc/kernel/signal_64.c @@ -562,7 +562,7 @@ void do_notify_resume(struct pt_regs *regs, unsigned long orig_i0, unsigned long */ static_assert(NSIGILL == 11); static_assert(NSIGFPE == 15); -static_assert(NSIGSEGV == 9); +static_assert(NSIGSEGV == 10); static_assert(NSIGBUS == 5); static_assert(NSIGTRAP == 6); static_assert(NSIGCHLD == 6); diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 1345088e9902..a90791433152 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -562,6 +562,10 @@ DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_SS, exc_stack_segment); DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_GP, exc_general_protection); DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_AC, exc_alignment_check); +#ifdef CONFIG_X86_SHADOW_STACK +DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP, exc_control_protection); +#endif + /* Raw exception entries which need extra work */ DECLARE_IDTENTRY_RAW(X86_TRAP_UD, exc_invalid_op); DECLARE_IDTENTRY_RAW(X86_TRAP_BP, exc_int3); diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index df0fa695bb09..9f1bdaabc246 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -113,6 +113,10 @@ static const __initconst struct idt_data def_idts[] = { #elif defined(CONFIG_X86_32) SYSG(IA32_SYSCALL_VECTOR, entry_INT80_32), #endif + +#ifdef CONFIG_X86_SHADOW_STACK + INTG(X86_TRAP_CP, asm_exc_control_protection), +#endif }; /* diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c index b52407c56000..ff50cd978ea5 100644 --- a/arch/x86/kernel/signal_compat.c +++ b/arch/x86/kernel/signal_compat.c @@ -27,7 +27,7 @@ static inline void signal_compat_build_tests(void) */ BUILD_BUG_ON(NSIGILL != 11); BUILD_BUG_ON(NSIGFPE != 15); - BUILD_BUG_ON(NSIGSEGV != 9); + BUILD_BUG_ON(NSIGSEGV != 10); BUILD_BUG_ON(NSIGBUS != 5); BUILD_BUG_ON(NSIGTRAP != 6); BUILD_BUG_ON(NSIGCHLD != 6); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index c9d566dcf89a..54b7a146fd5e 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -641,6 +642,67 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) cond_local_irq_disable(regs); } +#ifdef CONFIG_X86_SHADOW_STACK +static const char * const control_protection_err[] = { + "unknown", + "near-ret", + "far-ret/iret", + "endbranch", + "rstorssp", + "setssbsy", + "unknown", +}; + +static DEFINE_RATELIMIT_STATE(cpf_rate, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + +/* + * When a control protection exception occurs, send a signal to the responsible + * application. Currently, control protection is only enabled for user mode. + * This exception should not come from kernel mode. + */ +DEFINE_IDTENTRY_ERRORCODE(exc_control_protection) +{ + struct task_struct *tsk; + + if (!user_mode(regs)) { + die("kernel control protection fault", regs, error_code); + panic("Unexpected kernel control protection fault. Machine halted."); + } + + cond_local_irq_enable(regs); + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + WARN_ONCE(1, "Control protection fault with CET support disabled\n"); + + tsk = current; + tsk->thread.error_code = error_code; + tsk->thread.trap_nr = X86_TRAP_CP; + + /* + * Ratelimit to prevent log spamming. + */ + if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) && + __ratelimit(&cpf_rate)) { + unsigned long ssp; + int cpf_type; + + cpf_type = array_index_nospec(error_code, ARRAY_SIZE(control_protection_err)); + + rdmsrl(MSR_IA32_PL3_SSP, ssp); + pr_emerg("%s[%d] control protection ip:%lx sp:%lx ssp:%lx error:%lx(%s)", + tsk->comm, task_pid_nr(tsk), + regs->ip, regs->sp, ssp, error_code, + control_protection_err[cpf_type]); + print_vma_addr(KERN_CONT " in ", regs->ip); + pr_cont("\n"); + } + + force_sig_fault(SIGSEGV, SEGV_CPERR, (void __user *)0); + cond_local_irq_disable(regs); +} +#endif + static bool do_int3(struct pt_regs *regs) { int res; diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h index 3ba180f550d7..081f4b37d22c 100644 --- a/include/uapi/asm-generic/siginfo.h +++ b/include/uapi/asm-generic/siginfo.h @@ -240,7 +240,8 @@ typedef struct siginfo { #define SEGV_ADIPERR 7 /* Precise MCD exception */ #define SEGV_MTEAERR 8 /* Asynchronous ARM MTE error */ #define SEGV_MTESERR 9 /* Synchronous ARM MTE exception */ -#define NSIGSEGV 9 +#define SEGV_CPERR 10 /* Control protection fault */ +#define NSIGSEGV 10 /* * SIGBUS si_codes From patchwork Sun Jan 30 21:18:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48F30C4321E for ; Sun, 30 Jan 2022 21:22:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8F976B007D; Sun, 30 Jan 2022 16:21:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D2456B0081; Sun, 30 Jan 2022 16:21:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D4696B0082; Sun, 30 Jan 2022 16:21:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id 459966B0080 for ; Sun, 30 Jan 2022 16:21:53 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1158D95C9D for ; Sun, 30 Jan 2022 21:21:53 +0000 (UTC) X-FDA: 79088225706.14.1B5D83A Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf29.hostedemail.com (Postfix) with ESMTP id 6BF20120005 for ; Sun, 30 Jan 2022 21:21:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577712; x=1675113712; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=HlWo8OKNThkqHwQn3Dphtl6Opo3L/i/03JpsJVaIZ2A=; b=mVXYgsJQOu2k6FQvh6ltSpU3MIkZSi8nWsBWlA0iZuPkjUvApJ/JMgop rg3HpMiZ5VESr7gLQ3MPmDzUJNliqUpDULbTvLzfMA0+OPjZw6SWUGeJq 7tUheFkcktYTxn7beTAZTr8LTCgiHXsB/KUc7RK+8UO4s9C+cpl8oPhaY 4i3AD5+3WkVwJoiRccnCbg4OI3owr/QTGXqhBTW1ILT9q/bZ8wPx1N1yc EuoKvcxJxH402EGckh0mOwTS3W1K3grAoJDaRfa/82mMwJ7gxiiMzHVLL +dLODeaDpobUtWjz3wuS4eywJKSwUtjxorj56xKGeqzYUXF/Z5BPCx5tu w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970191" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970191" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:51 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856715" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:51 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Christoph Hellwig Subject: [PATCH 07/35] x86/mm: Remove _PAGE_DIRTY from kernel RO pages Date: Sun, 30 Jan 2022 13:18:10 -0800 Message-Id: <20220130211838.8382-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6BF20120005 X-Stat-Signature: teexhwpeerm3qmszw8io1qpnjmn6ifhe Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=mVXYgsJQ; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf29.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspam-User: nil X-HE-Tag: 1643577712-851530 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The x86 family of processors do not directly create read-only and Dirty PTEs. These PTEs are created by software. One such case is that kernel read-only pages are historically setup as Dirty. New processors that support Shadow Stack regard read-only and Dirty PTEs as shadow stack pages. This results in ambiguity between shadow stack and kernel read-only pages. To resolve this, removed Dirty from kernel read- only pages. Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: "H. Peter Anvin" Cc: Kees Cook Cc: Thomas Gleixner Cc: Dave Hansen Cc: Christoph Hellwig Cc: Andy Lutomirski Cc: Ingo Molnar Cc: Borislav Petkov Cc: Peter Zijlstra --- arch/x86/include/asm/pgtable_types.h | 6 +++--- arch/x86/mm/pat/set_memory.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 40497a9020c6..3781a79b6388 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -190,10 +190,10 @@ enum page_cache_mode { #define _KERNPG_TABLE (__PP|__RW| 0|___A| 0|___D| 0| 0| _ENC) #define _PAGE_TABLE_NOENC (__PP|__RW|_USR|___A| 0|___D| 0| 0) #define _PAGE_TABLE (__PP|__RW|_USR|___A| 0|___D| 0| 0| _ENC) -#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX|___D| 0|___G) -#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0|___D| 0|___G) +#define __PAGE_KERNEL_RO (__PP| 0| 0|___A|__NX| 0| 0|___G) +#define __PAGE_KERNEL_ROX (__PP| 0| 0|___A| 0| 0| 0|___G) #define __PAGE_KERNEL_NOCACHE (__PP|__RW| 0|___A|__NX|___D| 0|___G| __NC) -#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX|___D| 0|___G) +#define __PAGE_KERNEL_VVAR (__PP| 0|_USR|___A|__NX| 0| 0|___G) #define __PAGE_KERNEL_LARGE (__PP|__RW| 0|___A|__NX|___D|_PSE|___G) #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW| 0|___A| 0|___D|_PSE|___G) #define __PAGE_KERNEL_WP (__PP|__RW| 0|___A|__NX|___D| 0|___G| __WP) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index b4072115c8ef..844bb30280b7 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1943,7 +1943,7 @@ int set_memory_nx(unsigned long addr, int numpages) int set_memory_ro(unsigned long addr, int numpages) { - return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0); + return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW | _PAGE_DIRTY), 0); } int set_memory_rw(unsigned long addr, int numpages) From patchwork Sun Jan 30 21:18:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730138 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56199C43219 for ; Sun, 30 Jan 2022 21:22:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 21B326B0080; Sun, 30 Jan 2022 16:21:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CB7D6B0081; Sun, 30 Jan 2022 16:21:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEB906B0082; Sun, 30 Jan 2022 16:21:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0189.hostedemail.com [216.40.44.189]) by kanga.kvack.org (Postfix) with ESMTP id DFDB06B0080 for ; Sun, 30 Jan 2022 16:21:53 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AAAAA95C9D for ; Sun, 30 Jan 2022 21:21:53 +0000 (UTC) X-FDA: 79088225706.19.895E7FB Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf15.hostedemail.com (Postfix) with ESMTP id 2775CA0006 for ; Sun, 30 Jan 2022 21:21:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577713; x=1675113713; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=oKgQSRP9S/kqW3HSqD2DF7a0CZx0ho0AcrTJmlbuSJk=; b=EtCnnjCwkjuOoKSE5lsuG7mw/14hV9Oc0DCX7gOUuQcd9CK/orbBOD9H Q12OD9JJNVf7xMJljeaYyD1wEEgdPLDYWkV3bjU+WdXoAUW7K23MvOTie A+ehvK3pIJ0QtgTHk9AoAaqtqYY9zEdv7/O2PxC80Ro2IQatekMw0MEy2 0ZCfu85UkEIt/dmkakV/9kmY6JckgsMw4fEJLJabmg0uS3isEL+eBVIJK 1plYSmgHNUkJFP869Bzvs6QMkPg7qvVFoolxr1yb+61dIuDD5PprrI+OO R9zrf8UobGkzQlqfMrht0F9Z5p+cbolmrfISr1BKL/+51XReP5fEW1ySi Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970192" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970192" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:52 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856724" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:51 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 08/35] x86/mm: Move pmd_write(), pud_write() up in the file Date: Sun, 30 Jan 2022 13:18:11 -0800 Message-Id: <20220130211838.8382-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 2775CA0006 X-Stat-Signature: szcncqc4bdxephzqeincr51uy9arzyos Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=EtCnnjCw; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf15.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-HE-Tag: 1643577712-359771 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu To prepare the introduction of _PAGE_COW, move pmd_write() and pud_write() up in the file, so that they can be used by other helpers below. No functional changes. Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/pgtable.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 8a9432fb3802..aff5e489ff17 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -158,6 +158,18 @@ static inline int pte_write(pte_t pte) return pte_flags(pte) & _PAGE_RW; } +#define pmd_write pmd_write +static inline int pmd_write(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_RW; +} + +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + static inline int pte_huge(pte_t pte) { return pte_flags(pte) & _PAGE_PSE; @@ -1116,12 +1128,6 @@ extern int pmdp_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); -#define pmd_write pmd_write -static inline int pmd_write(pmd_t pmd) -{ - return pmd_flags(pmd) & _PAGE_RW; -} - #define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) @@ -1151,12 +1157,6 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } -#define pud_write pud_write -static inline int pud_write(pud_t pud) -{ - return pud_flags(pud) & _PAGE_RW; -} - #ifndef pmdp_establish #define pmdp_establish pmdp_establish static inline pmd_t pmdp_establish(struct vm_area_struct *vma, From patchwork Sun Jan 30 21:18:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7315CC35270 for ; Sun, 30 Jan 2022 21:22:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ECC0B6B0088; Sun, 30 Jan 2022 16:21:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E00726B0087; Sun, 30 Jan 2022 16:21:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2C676B0085; Sun, 30 Jan 2022 16:21:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0008.hostedemail.com [216.40.44.8]) by kanga.kvack.org (Postfix) with ESMTP id 9F5736B0082 for ; Sun, 30 Jan 2022 16:21:55 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5F28291ACD for ; Sun, 30 Jan 2022 21:21:55 +0000 (UTC) X-FDA: 79088225790.06.6D3C0AF Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf18.hostedemail.com (Postfix) with ESMTP id A65DA1C0005 for ; Sun, 30 Jan 2022 21:21:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577714; x=1675113714; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=xkVnTjzi6+uetv9e83D2AA4YYyIEZnM0yn9xpSFsd90=; b=Vr8QKKH+tiVI1lXdP33MnyXdVLQk8hU7hSrFXd5PH9QBmH1iToZbWdme Q9mzDgxTMvOjI0YNdq5GxGVvYX5iiIgisqWxgZSraTm/hkkkfa55dIQhz kMrogrGDcjpaoXwW7vPYcCGGDLm+OLyRWog60p/Ijkp7BZKnwqRTsatKj GuhzabKSN+kit0UYzjUo4NAYOOG5syYCN/HEEjDB7H23Q9w13IwahYcU3 03zriwR01zWAalKqqN8848Wqxu9KFy3NFTqU+1MrA/0R8yupZ4ZQ7kpgN 335dEwIBkX8F3ksyfx4YQMeNlhkv6Id8e5xaImjU1TyI6e/3pMM1JGOYE A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970193" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970193" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:53 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856732" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:52 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 09/35] x86/mm: Introduce _PAGE_COW Date: Sun, 30 Jan 2022 13:18:12 -0800 Message-Id: <20220130211838.8382-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A65DA1C0005 X-Stat-Signature: dtr7fdqyobjnd4ijqkzfdywiuby6dwcp X-Rspam-User: nil Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Vr8QKKH+; spf=none (imf18.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577714-461434 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu There is essentially no room left in the x86 hardware PTEs on some OSes (not Linux). That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0, Dirty=1. The reason it's lightly used is that Dirty=1 is normally set by hardware and cannot normally be set by hardware on a Write=0 PTE. Software must normally be involved to create one of these PTEs, so software can simply opt to not create them. In places where Linux normally creates Write=0, Dirty=1, it can use the software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs to create Write=0, Dirty=1, it instead creates Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This clearly separates shadow stack from other data, and results in the following: (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) (b) A R/O page that has been COW'ed: (Write=0, Cow=1) The user page is in a R/O VMA, and get_user_pages() needs a writable copy. The page fault handler creates a copy of the page and sets the new copy's PTE as Write=0 and Cow=1. (c) A shadow stack PTE: (Write=0, Dirty=1) (d) A shared shadow stack PTE: (Write=0, Cow=1) When a shadow stack page is being shared among processes (this happens at fork()), its PTE is made Dirty=0, so the next shadow stack access causes a fault, and the page is duplicated and Dirty=1 is set again. This is the COW equivalent for shadow stack pages, even though it's copy-on-access rather than copy-on-write. (e) A page where the processor observed a Write=1 PTE, started a write, set Dirty=1, but then observed a Write=0 PTE. That's possible today, but will not happen on processors that support shadow stack. Define _PAGE_COW and update pte_*() helpers and apply the same changes to pmd and pud. After this, there are six free bits left in the 64-bit PTE, and no more free bits in the 32-bit PTE (except for PAE) and Shadow Stack is not implemented for the 32-bit kernel. Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/pgtable.h | 196 ++++++++++++++++++++++++--- arch/x86/include/asm/pgtable_types.h | 42 +++++- 2 files changed, 217 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index aff5e489ff17..a4a75e78a934 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -123,9 +123,20 @@ extern pmdval_t early_pmd_flags; * The following only work if pte_present() is true. * Undefined behaviour if not.. */ -static inline int pte_dirty(pte_t pte) +static inline bool pte_dirty(pte_t pte) { - return pte_flags(pte) & _PAGE_DIRTY; + /* + * A dirty PTE has Dirty=1 or Cow=1. + */ + return pte_flags(pte) & _PAGE_DIRTY_BITS; +} + +static inline bool pte_shstk(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return false; + + return (pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pte_young(pte_t pte) @@ -133,9 +144,20 @@ static inline int pte_young(pte_t pte) return pte_flags(pte) & _PAGE_ACCESSED; } -static inline int pmd_dirty(pmd_t pmd) +static inline bool pmd_dirty(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_DIRTY; + /* + * A dirty PMD has Dirty=1 or Cow=1. + */ + return pmd_flags(pmd) & _PAGE_DIRTY_BITS; +} + +static inline bool pmd_shstk(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return false; + + return (pmd_flags(pmd) & (_PAGE_RW | _PAGE_DIRTY)) == _PAGE_DIRTY; } static inline int pmd_young(pmd_t pmd) @@ -143,9 +165,12 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } -static inline int pud_dirty(pud_t pud) +static inline bool pud_dirty(pud_t pud) { - return pud_flags(pud) & _PAGE_DIRTY; + /* + * A dirty PUD has Dirty=1 or Cow=1. + */ + return pud_flags(pud) & _PAGE_DIRTY_BITS; } static inline int pud_young(pud_t pud) @@ -155,13 +180,23 @@ static inline int pud_young(pud_t pud) static inline int pte_write(pte_t pte) { - return pte_flags(pte) & _PAGE_RW; + /* + * Shadow stack pages are always writable - but not by normal + * instructions, and only by shadow stack operations. Therefore, + * the W=0,D=1 test with pte_shstk(). + */ + return (pte_flags(pte) & _PAGE_RW) || pte_shstk(pte); } #define pmd_write pmd_write static inline int pmd_write(pmd_t pmd) { - return pmd_flags(pmd) & _PAGE_RW; + /* + * Shadow stack pages are always writable - but not by normal + * instructions, and only by shadow stack operations. Therefore, + * the W=0,D=1 test with pmd_shstk(). + */ + return (pmd_flags(pmd) & _PAGE_RW) || pmd_shstk(pmd); } #define pud_write pud_write @@ -299,6 +334,24 @@ static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) return native_make_pte(v & ~clear); } +static inline pte_t pte_mkcow(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte; + + pte = pte_clear_flags(pte, _PAGE_DIRTY); + return pte_set_flags(pte, _PAGE_COW); +} + +static inline pte_t pte_clear_cow(pte_t pte) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pte; + + pte = pte_set_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { @@ -318,7 +371,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return pte_clear_flags(pte, _PAGE_DIRTY); + return pte_clear_flags(pte, _PAGE_DIRTY_BITS); } static inline pte_t pte_mkold(pte_t pte) @@ -328,7 +381,16 @@ static inline pte_t pte_mkold(pte_t pte) static inline pte_t pte_wrprotect(pte_t pte) { - return pte_clear_flags(pte, _PAGE_RW); + pte = pte_clear_flags(pte, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PTE (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pte_dirty(pte)) + pte = pte_mkcow(pte); + return pte; } static inline pte_t pte_mkexec(pte_t pte) @@ -338,7 +400,18 @@ static inline pte_t pte_mkexec(pte_t pte) static inline pte_t pte_mkdirty(pte_t pte) { - return pte_set_flags(pte, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pteval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PTEs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pte_write(pte)) + dirty = _PAGE_COW; + + return pte_set_flags(pte, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pte_t pte_mkwrite_shstk(pte_t pte) +{ + return pte_clear_cow(pte); } static inline pte_t pte_mkyoung(pte_t pte) @@ -348,7 +421,12 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkwrite(pte_t pte) { - return pte_set_flags(pte, _PAGE_RW); + pte = pte_set_flags(pte, _PAGE_RW); + + if (pte_dirty(pte)) + pte = pte_clear_cow(pte); + + return pte; } static inline pte_t pte_mkhuge(pte_t pte) @@ -395,6 +473,24 @@ static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) return native_make_pmd(v & ~clear); } +static inline pmd_t pmd_mkcow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd; + + pmd = pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_set_flags(pmd, _PAGE_COW); +} + +static inline pmd_t pmd_clear_cow(pmd_t pmd) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pmd; + + pmd = pmd_set_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_COW); +} + #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pmd_uffd_wp(pmd_t pmd) { @@ -419,17 +515,36 @@ static inline pmd_t pmd_mkold(pmd_t pmd) static inline pmd_t pmd_mkclean(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_DIRTY); + return pmd_clear_flags(pmd, _PAGE_DIRTY_BITS); } static inline pmd_t pmd_wrprotect(pmd_t pmd) { - return pmd_clear_flags(pmd, _PAGE_RW); + pmd = pmd_clear_flags(pmd, _PAGE_RW); + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PMD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pmd_dirty(pmd)) + pmd = pmd_mkcow(pmd); + return pmd; } static inline pmd_t pmd_mkdirty(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pmdval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PMDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pmd_write(pmd)) + dirty = _PAGE_COW; + + return pmd_set_flags(pmd, dirty | _PAGE_SOFT_DIRTY); +} + +static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) +{ + return pmd_clear_cow(pmd); } static inline pmd_t pmd_mkdevmap(pmd_t pmd) @@ -449,7 +564,11 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) static inline pmd_t pmd_mkwrite(pmd_t pmd) { - return pmd_set_flags(pmd, _PAGE_RW); + pmd = pmd_set_flags(pmd, _PAGE_RW); + + if (pmd_dirty(pmd)) + pmd = pmd_clear_cow(pmd); + return pmd; } static inline pud_t pud_set_flags(pud_t pud, pudval_t set) @@ -466,6 +585,24 @@ static inline pud_t pud_clear_flags(pud_t pud, pudval_t clear) return native_make_pud(v & ~clear); } +static inline pud_t pud_mkcow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pud; + + pud = pud_clear_flags(pud, _PAGE_DIRTY); + return pud_set_flags(pud, _PAGE_COW); +} + +static inline pud_t pud_clear_cow(pud_t pud) +{ + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return pud; + + pud = pud_set_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_COW); +} + static inline pud_t pud_mkold(pud_t pud) { return pud_clear_flags(pud, _PAGE_ACCESSED); @@ -473,17 +610,32 @@ static inline pud_t pud_mkold(pud_t pud) static inline pud_t pud_mkclean(pud_t pud) { - return pud_clear_flags(pud, _PAGE_DIRTY); + return pud_clear_flags(pud, _PAGE_DIRTY_BITS); } static inline pud_t pud_wrprotect(pud_t pud) { - return pud_clear_flags(pud, _PAGE_RW); + pud = pud_clear_flags(pud, _PAGE_RW); + + /* + * Blindly clearing _PAGE_RW might accidentally create + * a shadow stack PUD (RW=0, Dirty=1). Move the hardware + * dirty value to the software bit. + */ + if (pud_dirty(pud)) + pud = pud_mkcow(pud); + return pud; } static inline pud_t pud_mkdirty(pud_t pud) { - return pud_set_flags(pud, _PAGE_DIRTY | _PAGE_SOFT_DIRTY); + pudval_t dirty = _PAGE_DIRTY; + + /* Avoid creating (HW)Dirty=1, Write=0 PUDs */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK) && !pud_write(pud)) + dirty = _PAGE_COW; + + return pud_set_flags(pud, dirty | _PAGE_SOFT_DIRTY); } static inline pud_t pud_mkdevmap(pud_t pud) @@ -503,7 +655,11 @@ static inline pud_t pud_mkyoung(pud_t pud) static inline pud_t pud_mkwrite(pud_t pud) { - return pud_set_flags(pud, _PAGE_RW); + pud = pud_set_flags(pud, _PAGE_RW); + + if (pud_dirty(pud)) + pud = pud_clear_cow(pud); + return pud; } #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 3781a79b6388..1bfab70ff9ac 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -21,7 +21,8 @@ #define _PAGE_BIT_SOFTW2 10 /* " */ #define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SOFTW4 58 /* available for programmer */ +#define _PAGE_BIT_SOFTW4 57 /* available for programmer */ +#define _PAGE_BIT_SOFTW5 58 /* available for programmer */ #define _PAGE_BIT_PKEY_BIT0 59 /* Protection Keys, bit 1/4 */ #define _PAGE_BIT_PKEY_BIT1 60 /* Protection Keys, bit 2/4 */ #define _PAGE_BIT_PKEY_BIT2 61 /* Protection Keys, bit 3/4 */ @@ -34,6 +35,15 @@ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 +/* + * Indicates a copy-on-write page. + */ +#ifdef CONFIG_X86_SHADOW_STACK +#define _PAGE_BIT_COW _PAGE_BIT_SOFTW5 /* copy-on-write */ +#else +#define _PAGE_BIT_COW 0 +#endif + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -115,6 +125,36 @@ #define _PAGE_DEVMAP (_AT(pteval_t, 0)) #endif +/* + * The hardware requires shadow stack to be read-only and Dirty. + * _PAGE_COW is a software-only bit used to separate copy-on-write PTEs + * from shadow stack PTEs: + * (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) + * (b) A R/O page that has been COW'ed: (Write=0, Cow=1) + * The user page is in a R/O VMA, and get_user_pages() needs a + * writable copy. The page fault handler creates a copy of the page + * and sets the new copy's PTE as Write=0, Cow=1. + * (c) A shadow stack PTE: (Write=0, Dirty=1) + * (d) A shared (copy-on-access) shadow stack PTE: (Write=0, Cow=1) + * When a shadow stack page is being shared among processes (this + * happens at fork()), its PTE is cleared of _PAGE_DIRTY, so the next + * shadow stack access causes a fault, and the page is duplicated and + * _PAGE_DIRTY is set again. This is the COW equivalent for shadow + * stack pages, even though it's copy-on-access rather than + * copy-on-write. + * (e) A page where the processor observed a Write=1 PTE, started a write, + * set Dirty=1, but then observed a Write=0 PTE (changed by another + * thread). That's possible today, but will not happen on processors + * that support shadow stack. + */ +#ifdef CONFIG_X86_SHADOW_STACK +#define _PAGE_COW (_AT(pteval_t, 1) << _PAGE_BIT_COW) +#else +#define _PAGE_COW (_AT(pteval_t, 0)) +#endif + +#define _PAGE_DIRTY_BITS (_PAGE_DIRTY | _PAGE_COW) + #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) /* From patchwork Sun Jan 30 21:18:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A63E2C4332F for ; Sun, 30 Jan 2022 21:22:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 417D16B0082; Sun, 30 Jan 2022 16:21:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CC3F6B0085; Sun, 30 Jan 2022 16:21:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC6706B0083; Sun, 30 Jan 2022 16:21:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay040.a.hostedemail.com [64.99.140.40]) by kanga.kvack.org (Postfix) with ESMTP id D44E36B0082 for ; Sun, 30 Jan 2022 16:21:55 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 88994220B6 for ; Sun, 30 Jan 2022 21:21:55 +0000 (UTC) X-FDA: 79088225790.01.8B6E852 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP id E9D8840003 for ; Sun, 30 Jan 2022 21:21:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577715; x=1675113715; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=+eAnCtRCGeSNI9GSLUBTMzpMHzzkOWpz+oSQRTzFUvU=; b=OHvyBVFy5cPHICX1LlaI8ySA/iN1MCtU3qbCuyXoaVbYPZYfNz/m0Wz0 I4Jj4u6UucB8d5XSFpII/80pUnDihYVp4WJjCe5mAtOPSZ6vA1w5siQPp E8qCdhhG6OFPcMhwCgWnLRgxtlEEGMlK1DgdqMOwL8xQ2aPFEcoOg3h1b Co3BFp0151tRCUz/8J6pEBPuZyCqCu0QN7T7KrADAJ8bOZmoBRrgb8U2j SHpnzkUUSNkHEE075oPgIK73Zk5wBpNf4RoSuTzNH5+A0BAxlnqxh6xgp qHcWPoTcsTjY/+JOKLRM1DkxFJ1Zr/SHOo5bWvsBA/CRmX6b9ueIl136t w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970197" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970197" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:54 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856742" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:53 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , David Airlie , Joonas Lahtinen , Jani Nikula , Daniel Vetter , Rodrigo Vivi , Zhenyu Wang , Zhi Wang Subject: [PATCH 10/35] drm/i915/gvt: Change _PAGE_DIRTY to _PAGE_DIRTY_BITS Date: Sun, 30 Jan 2022 13:18:13 -0800 Message-Id: <20220130211838.8382-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E9D8840003 X-Rspam-User: nil Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=OHvyBVFy; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf27.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: ju3bxdatxebd3hjd61z1g9gfmhfe5o11 X-HE-Tag: 1643577714-734698 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu After the introduction of _PAGE_COW, a modified page's PTE can have either _PAGE_DIRTY or _PAGE_COW. Change _PAGE_DIRTY to _PAGE_DIRTY_BITS. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: David Airlie Cc: Joonas Lahtinen Cc: Jani Nikula Cc: Daniel Vetter Cc: Rodrigo Vivi Cc: Zhenyu Wang Cc: Zhi Wang --- drivers/gpu/drm/i915/gvt/gtt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index 99d1781fa5f0..75ce4e823902 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -1210,7 +1210,7 @@ static int split_2MB_gtt_entry(struct intel_vgpu *vgpu, } /* Clear dirty field. */ - se->val64 &= ~_PAGE_DIRTY; + se->val64 &= ~_PAGE_DIRTY_BITS; ops->clear_pse(se); ops->clear_ips(se); From patchwork Sun Jan 30 21:18:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7341C4167B for ; Sun, 30 Jan 2022 21:22:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCB9D6B0085; Sun, 30 Jan 2022 16:21:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C2CED6B0087; Sun, 30 Jan 2022 16:21:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 993096B0089; Sun, 30 Jan 2022 16:21:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0163.hostedemail.com [216.40.44.163]) by kanga.kvack.org (Postfix) with ESMTP id 8A3CC6B0085 for ; Sun, 30 Jan 2022 16:21:56 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 4CDFC8249980 for ; Sun, 30 Jan 2022 21:21:56 +0000 (UTC) X-FDA: 79088225832.28.71E98B2 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf18.hostedemail.com (Postfix) with ESMTP id A64FB1C0005 for ; Sun, 30 Jan 2022 21:21:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577715; x=1675113715; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=HHh4lkLfyu0d0fqFw9G2b577iSJVRa+u0pjug6+8QXs=; b=RTdGMa0EibAKAzXyixV3brxqogUSs/l8xbFNOE6YKM2OJGRM/BYih7km RxLkcuGaS9c3TRoZGt2OcqZWXKZA7NdiY6ywiYAuM6xcsSh5bSeKnsw33 znEQDjbqhwOHL39l4eF4tC7kjwkmsn5VtB2RTYuHXsKEuQ5tQOGJscK8g AhbgASzbUyPeq0GI3LODjKFWTEJr29P2/BonwnTCyUkysMz6gIjF88RTe nFrhnqwoIbBc/Kju6K5SjLm8ecwR3URIaMmNb8JW8BQF8SqkFZtxoqvHu 6DrPdhsz6pxDsNtAeNfBCyrzuN//lU9v1LrLXu3tEDt05FvMsKGAcj80c Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970201" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970201" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:55 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856751" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:54 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 11/35] x86/mm: Update pte_modify for _PAGE_COW Date: Sun, 30 Jan 2022 13:18:14 -0800 Message-Id: <20220130211838.8382-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A64FB1C0005 X-Stat-Signature: bxra4ixysb61ufzoh9831i98cnk7h5hr X-Rspam-User: nil Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=RTdGMa0E; spf=none (imf18.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577715-567705 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The read-only and Dirty PTE has been used to indicate copy-on-write pages. However, newer x86 processors also regard a read-only and Dirty PTE as a shadow stack page. In order to separate the two, the software-defined _PAGE_COW is created to replace _PAGE_DIRTY for the copy-on-write case, and pte_*() are updated. Pte_modify() changes a PTE to 'newprot', but it doesn't use the pte_*(). Introduce fixup_dirty_pte(), which sets a dirty PTE, based on _PAGE_RW, to either _PAGE_DIRTY or _PAGE_COW. Apply the same changes to pmd_modify(). Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/pgtable.h | 37 ++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a4a75e78a934..5c3886f6ccda 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -773,6 +773,23 @@ static inline pmd_t pmd_mkinvalid(pmd_t pmd) static inline u64 flip_protnone_guard(u64 oldval, u64 val, u64 mask); +static inline pteval_t fixup_dirty_pte(pteval_t pteval) +{ + pte_t pte = __pte(pteval); + + /* + * Fix up potential shadow stack page flags because the RO, Dirty + * PTE is special. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pte_dirty(pte)) { + pte = pte_mkclean(pte); + pte = pte_mkdirty(pte); + } + } + return pte_val(pte); +} + static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { pteval_t val = pte_val(pte), oldval = val; @@ -783,16 +800,36 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) */ val &= _PAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_PAGE_CHG_MASK; + val = fixup_dirty_pte(val); val = flip_protnone_guard(oldval, val, PTE_PFN_MASK); return __pte(val); } +static inline int pmd_write(pmd_t pmd); +static inline pmdval_t fixup_dirty_pmd(pmdval_t pmdval) +{ + pmd_t pmd = __pmd(pmdval); + + /* + * Fix up potential shadow stack page flags because the RO, Dirty + * PMD is special. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + if (pmd_dirty(pmd)) { + pmd = pmd_mkclean(pmd); + pmd = pmd_mkdirty(pmd); + } + } + return pmd_val(pmd); +} + static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) { pmdval_t val = pmd_val(pmd), oldval = val; val &= _HPAGE_CHG_MASK; val |= check_pgprot(newprot) & ~_HPAGE_CHG_MASK; + val = fixup_dirty_pmd(val); val = flip_protnone_guard(oldval, val, PHYSICAL_PMD_PAGE_MASK); return __pmd(val); } From patchwork Sun Jan 30 21:18:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0CB7C35273 for ; Sun, 30 Jan 2022 21:22:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9ABCD6B0087; Sun, 30 Jan 2022 16:21:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 91DB26B0089; Sun, 30 Jan 2022 16:21:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74A2C6B008A; Sun, 30 Jan 2022 16:21:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0209.hostedemail.com [216.40.44.209]) by kanga.kvack.org (Postfix) with ESMTP id 5A06D6B0087 for ; Sun, 30 Jan 2022 16:21:57 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 265DC181E7875 for ; Sun, 30 Jan 2022 21:21:57 +0000 (UTC) X-FDA: 79088225874.25.55CE9EC Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP id 5594640003 for ; Sun, 30 Jan 2022 21:21:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577716; x=1675113716; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=W6imSLOXTOIpBYqoUozg9ybVACMS4TFkDAllXYnU9RI=; b=X+SsKDa6oF9aiKnJjQUnzWbkL+7B8Sot5tgfCRbXTGgvkaztcSJXSPbf f3Gr0nR1nRFJ06wynFs/v41IwQG7e8gcHW6GafFPP+f0kcVn+RsGBoglh dy20M0BjDGZAvIYzMp8eUk0enVKrVhvTCDUWXspuqqwuYsbCqp7bHdvwG Z//rI8xl+pPQRR0AFS4+4s3JOD9cX//2N4uBJwxa68YVZhdxWKuAeMWm+ KRvzMo+jD/EXG1ZmyaA2YnUvd9CyOQYWFHeSdk9at85ll0kpzHu5BK7LH yAgYjtg+/mdln15g9HaT6bhTBbTbytFTIxMgUR7TQibopaH+SGzaZY0ZQ Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970202" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970202" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:55 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856760" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:55 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 12/35] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW Date: Sun, 30 Jan 2022 13:18:15 -0800 Message-Id: <20220130211838.8382-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5594640003 X-Rspam-User: nil Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=X+SsKDa6; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf27.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: auk4diim9b94sy5te71jju8u9q8711ca X-HE-Tag: 1643577716-216596 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu When Shadow Stack is introduced, [R/O + _PAGE_DIRTY] PTE is reserved for shadow stack. Copy-on-write PTEs have [R/O + _PAGE_COW]. When a PTE goes from [R/W + _PAGE_DIRTY] to [R/O + _PAGE_COW], it could become a transient shadow stack PTE in two cases: The first case is that some processors can start a write but end up seeing a read-only PTE by the time they get to the Dirty bit, creating a transient shadow stack PTE. However, this will not occur on processors supporting Shadow Stack, and a TLB flush is not necessary. The second case is that when _PAGE_DIRTY is replaced with _PAGE_COW non- atomically, a transient shadow stack PTE can be created as a result. Thus, prevent that with cmpxchg. Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many insights to the issue. Jann Horn provided the cmpxchg solution. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- Yu-cheng v30: - Replace (pmdval_t) cast with CONFIG_PGTABLE_LEVELES > 2 (Borislav Petkov). arch/x86/include/asm/pgtable.h | 38 ++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 5c3886f6ccda..e1061b9cba6a 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1295,6 +1295,24 @@ static inline void ptep_clear(struct mm_struct *mm, unsigned long addr, static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + /* + * If Shadow Stack is enabled, pte_wrprotect() moves _PAGE_DIRTY + * to _PAGE_COW (see comments at pte_wrprotect()). + * When a thread reads a RW=1, Dirty=0 PTE and before changing it + * to RW=0, Dirty=0, another thread could have written to the page + * and the PTE is RW=1, Dirty=1 now. Use try_cmpxchg() to detect + * PTE changes and update old_pte, then try again. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pte_t old_pte, new_pte; + + old_pte = READ_ONCE(*ptep); + do { + new_pte = pte_wrprotect(old_pte); + } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte)); + + return; + } clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte); } @@ -1347,6 +1365,26 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { +#if CONFIG_PGTABLE_LEVELS > 2 + /* + * If Shadow Stack is enabled, pmd_wrprotect() moves _PAGE_DIRTY + * to _PAGE_COW (see comments at pmd_wrprotect()). + * When a thread reads a RW=1, Dirty=0 PMD and before changing it + * to RW=0, Dirty=0, another thread could have written to the page + * and the PMD is RW=1, Dirty=1 now. Use try_cmpxchg() to detect + * PMD changes and update old_pmd, then try again. + */ + if (cpu_feature_enabled(X86_FEATURE_SHSTK)) { + pmd_t old_pmd, new_pmd; + + old_pmd = READ_ONCE(*pmdp); + do { + new_pmd = pmd_wrprotect(old_pmd); + } while (!try_cmpxchg(&pmdp->pmd, &old_pmd.pmd, new_pmd.pmd)); + + return; + } +#endif clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } From patchwork Sun Jan 30 21:18:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730142 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7353EC35274 for ; Sun, 30 Jan 2022 21:22:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37B6F6B0089; Sun, 30 Jan 2022 16:21:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 305DD6B008A; Sun, 30 Jan 2022 16:21:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 17EA56B008C; Sun, 30 Jan 2022 16:21:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id EECC36B0089 for ; Sun, 30 Jan 2022 16:21:57 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id ABEC28249980 for ; Sun, 30 Jan 2022 21:21:57 +0000 (UTC) X-FDA: 79088225874.04.05CFDFF Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf18.hostedemail.com (Postfix) with ESMTP id 39C691C0006 for ; Sun, 30 Jan 2022 21:21:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577717; x=1675113717; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=QfQ3gnydDk43tc7xRks6jpks+dE4z+Vv7IsUEppemXA=; b=gESWbdUp5wxXFR2mYQZ15D1U9FJ21ilEna3maZWGS/TQSG+hgKnJOja2 Bied3GPncJwRLIo6oWW/tEwg5VjvWTD89bYzuv3runHV2s6v2yAIgAucn f4EDS0ERcwRE276DCLVcisj7BqZarEYSEDDbQv+3uiWRIXBgnJqp+6tyM EjG6Wr/+9a2AL6E/RG+MA+Gm46LeYBmgXs3K8cOhPtm/DKb+UbGkJ1zO+ 8JbQ54gKI8XS2sqMym/JCgXatGmrGPkUeeqC2By0tEEB65Q5UArQdYr4q ijsqBWrhDE8Lf9pcC8nHICShQ7twU5JCQyj6vSLGEyigeIWwfFvVRVfo+ w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970206" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970206" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:56 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856770" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:55 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Peter Xu Subject: [PATCH 13/35] mm: Move VM_UFFD_MINOR_BIT from 37 to 38 Date: Sun, 30 Jan 2022 13:18:16 -0800 Message-Id: <20220130211838.8382-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 39C691C0006 X-Stat-Signature: h434uui7y4grh4ac3qgbesow3dujegji X-Rspam-User: nil Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=gESWbdUp; spf=none (imf18.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577716-452183 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu To introduce VM_SHADOW_STACK as VM_HIGH_ARCH_BIT (37), and make all VM_HIGH_ARCH_BITs stay together, move VM_UFFD_MINOR_BIT from 37 to 38. Signed-off-by: Yu-cheng Yu Reviewed-by: Axel Rasmussen Signed-off-by: Rick Edgecombe Cc: Peter Xu Cc: Mike Kravetz --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e1a84b1e6787..2e74c0ab6d25 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -359,7 +359,7 @@ extern unsigned int kobjsize(const void *objp); #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR -# define VM_UFFD_MINOR_BIT 37 +# define VM_UFFD_MINOR_BIT 38 # define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */ #else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ # define VM_UFFD_MINOR VM_NONE From patchwork Sun Jan 30 21:18:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730144 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FDF4C3526C for ; Sun, 30 Jan 2022 21:22:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 354786B008A; Sun, 30 Jan 2022 16:21:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B78F6B008C; Sun, 30 Jan 2022 16:21:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 046BA6B0093; Sun, 30 Jan 2022 16:21:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id DD0AB6B008A for ; Sun, 30 Jan 2022 16:21:58 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 975B38249980 for ; Sun, 30 Jan 2022 21:21:58 +0000 (UTC) X-FDA: 79088225916.22.5516644 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP id CFBB340005 for ; Sun, 30 Jan 2022 21:21:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577717; x=1675113717; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=jMzw65MKfqq52oVnmq5qfAZv3M9YR8fJMyeeTLiS5eo=; b=FWDQgYul9tIBtT8wxtKfDCHUcw1ryQyvaSwljPFfyvN8X+OvY+gZN+qD VVUuterVkB6S4mfnj4ofGG+GK3ElzFuWI5rpecGZA09i7Iqnz62VkdLCk j1Yg5X3RknTFoWHGhv/vU9N7T/m8kglp6TD+kmcc1xzWOKSAfogepF2Pj 3cc6K4s2nsPDYEVljDsA0rw1ZqnCxRYqDk2+i0uAwpvjti4ZuZw+Alo27 602i+bK6K7+bgPBv4Fb9KbT+6AlhRSEsx3WPgBTOV193CGMFSRkwdqmlQ peMoNyng8xuFmCeszlHof+OtOiT17ZvX77/dwdII/hPY+AAF7/1jiiNoY A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970207" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970207" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:57 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856781" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:56 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 14/35] mm: Introduce VM_SHADOW_STACK for shadow stack memory Date: Sun, 30 Jan 2022 13:18:17 -0800 Message-Id: <20220130211838.8382-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: CFBB340005 X-Rspam-User: nil Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=FWDQgYul; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf27.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: 3t179bbcjposbptdqiwen7ufcfprqj39 X-HE-Tag: 1643577717-728578 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu A shadow stack PTE must be read-only and have _PAGE_DIRTY set. However, read-only and Dirty PTEs also exist for copy-on-write (COW) pages. These two cases are handled differently for page faults. Introduce VM_SHADOW_STACK to track shadow stack VMAs. Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Kees Cook --- Documentation/filesystems/proc.rst | 1 + arch/x86/mm/mmap.c | 2 ++ fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 8 ++++++++ 4 files changed, 14 insertions(+) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 061744c436d9..3f8c0fbb9cb3 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -555,6 +555,7 @@ encoded manner. The codes are the following: mt arm64 MTE allocation tags are enabled um userfaultfd missing tracking uw userfaultfd wr-protect tracking + ss shadow stack page == ======================================= Note that there is no guarantee that every flag and associated mnemonic will diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index c90c20904a60..f3f52c5e2fd6 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -165,6 +165,8 @@ unsigned long get_mmap_base(int is_legacy) const char *arch_vma_name(struct vm_area_struct *vma) { + if (vma->vm_flags & VM_SHADOW_STACK) + return "[shadow stack]"; return NULL; } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 18f8c3acbb85..78d9b0fd2aee 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -679,6 +679,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_ARCH_HAS_SHADOW_STACK + [ilog2(VM_SHADOW_STACK)] = "ss", +#endif }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 2e74c0ab6d25..311c6018d503 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -308,11 +308,13 @@ extern unsigned int kobjsize(const void *objp); #define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit architectures */ +#define VM_HIGH_ARCH_BIT_5 37 /* bit only usable on 64-bit architectures */ #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) +#define VM_HIGH_ARCH_5 BIT(VM_HIGH_ARCH_BIT_5) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ #ifdef CONFIG_ARCH_HAS_PKEYS @@ -328,6 +330,12 @@ extern unsigned int kobjsize(const void *objp); #endif #endif /* CONFIG_ARCH_HAS_PKEYS */ +#ifdef CONFIG_X86_SHADOW_STACK +# define VM_SHADOW_STACK VM_HIGH_ARCH_5 +#else +# define VM_SHADOW_STACK VM_NONE +#endif + #if defined(CONFIG_X86) # define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) From patchwork Sun Jan 30 21:18:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 479C4C433F5 for ; Sun, 30 Jan 2022 21:22:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2B2E6B008C; Sun, 30 Jan 2022 16:21:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DB30B6B0093; Sun, 30 Jan 2022 16:21:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C04886B0095; Sun, 30 Jan 2022 16:21:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0239.hostedemail.com [216.40.44.239]) by kanga.kvack.org (Postfix) with ESMTP id 9E9A36B008C for ; Sun, 30 Jan 2022 16:21:59 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6F7E1972E4 for ; Sun, 30 Jan 2022 21:21:59 +0000 (UTC) X-FDA: 79088225958.14.CB6C2CB Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf18.hostedemail.com (Postfix) with ESMTP id B54631C0006 for ; Sun, 30 Jan 2022 21:21:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577718; x=1675113718; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=95Ekd9pFZXWEBY+wGysVOoEb3gOtJAhC5Dzmrl7nHJo=; b=I57QP6fKotv4XIttfrK/8ON1wasZfGohsoQoTNZq2kRKXg5s4DFKqNW3 bXcUaDefGJg47xFI1vTd4hp2WCDaYjFAidUfu0/fZ1bnEWHwXO3/VL3pC 5xoc1vN4jPIsF5izqOb9pa8ynTou9U7XwolTBEcq6gdegmd1O1i0REaCc 9PRyfh+KfeVGof1vBW4o2KCNicVWNo0bhzMht3odiEdmTmWYjcT/7SW2a /aTs7yZv9cIuPo+Z2CpmEgQx2p7Etwotp2nEd1kn7mM9Z0BBlK7D/VhSP Z/r6wbySoZVxK8p70AFN0x5z+IocFFqzVvXxjJaER01+bF1epxmkw6Ikp w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970211" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970211" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:58 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856787" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:57 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 15/35] x86/mm: Check Shadow Stack page fault errors Date: Sun, 30 Jan 2022 13:18:18 -0800 Message-Id: <20220130211838.8382-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B54631C0006 X-Stat-Signature: 8bjkus1qr3y7kj19crf5nzee87qx1ioi X-Rspam-User: nil Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=I57QP6fK; spf=none (imf18.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577718-724193 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Shadow stack accesses are those that are performed by the CPU where it expects to encounter a shadow stack mapping. These accesses are performed implicitly by CALL/RET at the site of the shadow stack pointer. These accesses are made explicitly by shadow stack management instructions like WRUSSQ. Shadow stacks accesses to shadow-stack mapping can see faults in normal, valid operation just like regular accesses to regular mappings. Shadow stacks need some of the same features like delayed allocation, swap and copy-on-write. Shadow stack accesses can also result in errors, such as when a shadow stack overflows, or if a shadow stack access occurs to a non-shadow-stack mapping. In handling a shadow stack page fault, verify it occurs within a shadow stack mapping. It is always an error otherwise. For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect copy-on-write. Because clearing _PAGE_DIRTY (vs. _PAGE_RW) is used to trigger the fault, shadow stack read fault and shadow stack write fault are not differentiated and both are handled as a write access. Signed-off-by: Yu-cheng Yu Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- Yu-cheng v30: - Update Subject line and add a verb. arch/x86/include/asm/trap_pf.h | 2 ++ arch/x86/mm/fault.c | 19 +++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h index 10b1de500ab1..afa524325e55 100644 --- a/arch/x86/include/asm/trap_pf.h +++ b/arch/x86/include/asm/trap_pf.h @@ -11,6 +11,7 @@ * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch * bit 5 == 1: protection keys block access + * bit 6 == 1: shadow stack access fault * bit 15 == 1: SGX MMU page-fault */ enum x86_pf_error_code { @@ -20,6 +21,7 @@ enum x86_pf_error_code { X86_PF_RSVD = 1 << 3, X86_PF_INSTR = 1 << 4, X86_PF_PK = 1 << 5, + X86_PF_SHSTK = 1 << 6, X86_PF_SGX = 1 << 15, }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index d0074c6ed31a..6769134986ec 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1107,6 +1107,17 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) (error_code & X86_PF_INSTR), foreign)) return 1; + /* + * Verify a shadow stack access is within a shadow stack VMA. + * It is always an error otherwise. Normal data access to a + * shadow stack area is checked in the case followed. + */ + if (error_code & X86_PF_SHSTK) { + if (!(vma->vm_flags & VM_SHADOW_STACK)) + return 1; + return 0; + } + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ if (unlikely(!(vma->vm_flags & VM_WRITE))) @@ -1300,6 +1311,14 @@ void do_user_addr_fault(struct pt_regs *regs, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + /* + * Clearing _PAGE_DIRTY is used to detect shadow stack access. + * This method cannot distinguish shadow stack read vs. write. + * For valid shadow stack accesses, set FAULT_FLAG_WRITE to effect + * copy-on-write. + */ + if (error_code & X86_PF_SHSTK) + flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; if (error_code & X86_PF_INSTR) From patchwork Sun Jan 30 21:18:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730146 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C99CC433FE for ; Sun, 30 Jan 2022 21:22:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E5856B0093; Sun, 30 Jan 2022 16:22:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D0506B0096; Sun, 30 Jan 2022 16:22:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 749F06B0098; Sun, 30 Jan 2022 16:22:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232]) by kanga.kvack.org (Postfix) with ESMTP id 63B336B0093 for ; Sun, 30 Jan 2022 16:22:00 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 241A8181DF763 for ; Sun, 30 Jan 2022 21:22:00 +0000 (UTC) X-FDA: 79088226000.14.7AF5198 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP id 74E7540002 for ; Sun, 30 Jan 2022 21:21:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577719; x=1675113719; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=/0hTOGjoC4CyAc1QDDk6aPdkNAiEc470qCgFvhPkCgI=; b=bKLtdut3Ogdq4+oOSpJgsqPOgg+rV6mDdSuKtqY+S0ltre7UIHQK56w8 Ai74Vr12IOm4fdRTQFyyt1OY3zFmCpv/PYtDLHieYQObCrLYt+uNeBoT4 gX9lnQLs99NElKyFqpXa3KZECxlbkMuzVlDyfhybYN8KmX9OpnMyiVySS 9TYf/Ov+L7MyGLi212ag/YNkpcH/ZHwyavEnYkW09rDaDh+pfEzI7WdyW g1Cb1MgsCLsO5z8SYWRlfCkVtZkutTvsbjqkMi+GbSihPwvh26rtBxqC1 suV3tm8ouH3xnDJSrEVpvLjZKz/UvkKg9qvvtsuKi17YXuQgiZY9ZaK5v w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970212" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970212" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:59 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856798" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:58 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 16/35] x86/mm: Update maybe_mkwrite() for shadow stack Date: Sun, 30 Jan 2022 13:18:19 -0800 Message-Id: <20220130211838.8382-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 74E7540002 X-Rspam-User: nil Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=bKLtdut3; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf27.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: d5ticxrg38i93qntoo6q8k7d9jk6ht5x X-HE-Tag: 1643577719-573060 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu When serving a page fault, maybe_mkwrite() makes a PTE writable if its vma has VM_WRITE. A shadow stack vma has VM_SHADOW_STACK. Its PTEs have _PAGE_DIRTY, but not _PAGE_WRITE. In fork(), _PAGE_DIRTY is cleared to cause copy-on-write, and in the page fault handler, _PAGE_DIRTY is restored and the shadow stack page is writable again. Introduce an x86 version of maybe_mkwrite(), which sets proper PTE bits according to VM flags. Apply the same changes to maybe_pmd_mkwrite(). Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Kees Cook --- Yu-cheng v29: - Remove likely()'s. arch/x86/include/asm/pgtable.h | 6 ++++++ arch/x86/mm/pgtable.c | 20 ++++++++++++++++++++ include/linux/mm.h | 2 ++ mm/huge_memory.c | 2 ++ 4 files changed, 30 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e1061b9cba6a..36166bdd0b98 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -282,6 +282,9 @@ static inline int pmd_trans_huge(pmd_t pmd) return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE; } +#define maybe_pmd_mkwrite maybe_pmd_mkwrite +extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); + #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD static inline int pud_trans_huge(pud_t pud) { @@ -1660,6 +1663,9 @@ static inline bool arch_faults_on_old_pte(void) return false; } +#define maybe_mkwrite maybe_mkwrite +extern pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PGTABLE_H */ diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 3481b35cb4ec..c22c8e9c37e8 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -610,6 +610,26 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, } #endif +pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_WRITE) + pte = pte_mkwrite(pte); + else if (vma->vm_flags & VM_SHADOW_STACK) + pte = pte_mkwrite_shstk(pte); + return pte; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) +{ + if (vma->vm_flags & VM_WRITE) + pmd = pmd_mkwrite(pmd); + else if (vma->vm_flags & VM_SHADOW_STACK) + pmd = pmd_mkwrite_shstk(pmd); + return pmd; +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + /** * reserve_top_address - reserves a hole in the top of kernel address space * @reserve - size of hole to reserve diff --git a/include/linux/mm.h b/include/linux/mm.h index 311c6018d503..b3cb3a17037b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -955,12 +955,14 @@ void free_compound_page(struct page *page); * pte_mkwrite. But get_user_pages can cause write faults for mappings * that do not have writing enabled, when used by access_process_vm. */ +#ifndef maybe_mkwrite static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) pte = pte_mkwrite(pte); return pte; } +#endif vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page); void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 406a3c28c026..2adedcfca00b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -491,12 +491,14 @@ static int __init setup_transparent_hugepage(char *str) } __setup("transparent_hugepage=", setup_transparent_hugepage); +#ifndef maybe_pmd_mkwrite pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) pmd = pmd_mkwrite(pmd); return pmd; } +#endif #ifdef CONFIG_MEMCG static inline struct deferred_split *get_deferred_split_queue(struct page *page) From patchwork Sun Jan 30 21:18:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 611FAC43217 for ; Sun, 30 Jan 2022 21:22:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8AC236B0096; Sun, 30 Jan 2022 16:22:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 835FF6B0098; Sun, 30 Jan 2022 16:22:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65EC76B0099; Sun, 30 Jan 2022 16:22:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0105.hostedemail.com [216.40.44.105]) by kanga.kvack.org (Postfix) with ESMTP id 4B32A6B0096 for ; Sun, 30 Jan 2022 16:22:01 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 063BB181DF763 for ; Sun, 30 Jan 2022 21:22:01 +0000 (UTC) X-FDA: 79088226042.04.C7A2CC6 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf18.hostedemail.com (Postfix) with ESMTP id 4A1811C0006 for ; Sun, 30 Jan 2022 21:22:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577720; x=1675113720; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=GcK8U902or9ILWaRmLm8X+dvVTHxi1AwyGT1VYKPQZ8=; b=GWQXs0inwx2P+HfxD6eYx1AK+PgaKj6mqxbGgxe8YNFpG9DpCjKshh/t GH0VmqyhEtAhd5Ut6CxkdUMzQL/IyzjiDNpoQ9GMJnyzIaYVr9oFv3IR/ YOWcEf/7A/vhwF28GP7/GOVAywfnI1TA5naQotnZkxXO1Fl4fN7nXCyXD xjf0+4j877nxxSX4zji0WasWM8XVNcMJyVnxYFcIIHnqCcEwlJtrdlf07 HH5WY3+uR9Fhsh1N7hhzQPRZjdNT07vBbfrAORFJ6fucAY5GakyT+jVJ4 Iz31mlyO6eHhuvqMfNCQAIvSMgoq40MsjmfBxCnD9u2czH50SQ0UyBw8Z w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970213" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970213" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:59 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856805" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:58 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 17/35] mm: Fixup places that call pte_mkwrite() directly Date: Sun, 30 Jan 2022 13:18:20 -0800 Message-Id: <20220130211838.8382-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4A1811C0006 X-Stat-Signature: zrgbddjohk7c4ejhfepkstgsfnqeoxbb X-Rspam-User: nil Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=GWQXs0in; spf=none (imf18.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577720-26126 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu When serving a page fault, maybe_mkwrite() makes a PTE writable if it is in a writable vma. A shadow stack vma is writable, but its PTEs need _PAGE_DIRTY to be set to become writable. For this reason, maybe_mkwrite() has been updated. There are a few places that call pte_mkwrite() directly, but have the same result as from maybe_mkwrite(). These sites need to be updated for shadow stack as well. Thus, change them to maybe_mkwrite(): - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly and call pte_mkwrite(), which is the same as maybe_mkwrite(). Change them to maybe_mkwrite(). - In do_numa_page(), if the numa entry was writable, then pte_mkwrite() is called directly. Fix it by doing maybe_mkwrite(). Make the same changes to do_huge_pmd_numa_page(). - In change_pte_range(), pte_mkwrite() is called directly. Replace it with maybe_mkwrite(). Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Kees Cook --- Yu-cheng v25: - Apply same changes to do_huge_pmd_numa_page() as to do_numa_page(). mm/huge_memory.c | 2 +- mm/memory.c | 5 ++--- mm/migrate.c | 3 +-- mm/mprotect.c | 2 +- 4 files changed, 5 insertions(+), 7 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2adedcfca00b..3588e9fefbe0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1489,7 +1489,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) pmd = pmd_modify(oldpmd, vma->vm_page_prot); pmd = pmd_mkyoung(pmd); if (was_writable) - pmd = pmd_mkwrite(pmd); + pmd = maybe_pmd_mkwrite(pmd, vma); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..c79444603d5d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3793,8 +3793,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); @@ -4428,7 +4427,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) pte = pte_modify(old_pte, vma->vm_page_prot); pte = pte_mkyoung(pte); if (was_writable) - pte = pte_mkwrite(pte); + pte = maybe_mkwrite(pte, vma); ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); diff --git a/mm/migrate.c b/mm/migrate.c index c7da064b4781..438f1e21b9c7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2697,8 +2697,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } } else { entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/mprotect.c b/mm/mprotect.c index 0138dfcdb1d8..b0012c13a00e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -135,7 +135,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { - ptent = pte_mkwrite(ptent); + ptent = maybe_mkwrite(ptent, vma); } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); pages++; From patchwork Sun Jan 30 21:18:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730148 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF7A3C433FE for ; Sun, 30 Jan 2022 21:22:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CBE26B0098; Sun, 30 Jan 2022 16:22:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 330236B009A; Sun, 30 Jan 2022 16:22:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E0BB6B009B; Sun, 30 Jan 2022 16:22:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id E541B6B0098 for ; Sun, 30 Jan 2022 16:22:01 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id AB29B95295 for ; Sun, 30 Jan 2022 21:22:01 +0000 (UTC) X-FDA: 79088226042.04.1C790FD Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP id F07C140003 for ; Sun, 30 Jan 2022 21:22:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577721; x=1675113721; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=JTPyM8ZAgyeb9WCq9+VH3FXgpAXnWn1PijDxlRo5/Us=; b=Z16Y+Sd5+dXMjzc3qsc/NGUN8jNBXJx1Uph7z6lKaFV3BvoQFrW+ZCl5 yOyXx45vmQzHVHAUjkATda6+Oequ09mOqPoXZpWXLDAYtnOlGGiXXon3/ 8H+Shnm2XjsSKhDfN5MxDIE/7ppqkb5I09oMZuirvLxrIc4EL/kJrYXpx bHgxcpUcHRAafF+MCp1pVVL9bfu9ZusrvyCniqbpHWqb3gSKpNnWOxwYx pnQLBiOUzVe9TJL2/dMIg6vsjSpxjEz6iGxLKotF4ufe9odt26iK9DbCw ya9CkKJaa7pH4dmngXs84Pndey8oAe7g4arM+NPxZVwB1Mht+2GKtiVwO Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970215" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970215" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:00 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856812" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:21:59 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 18/35] mm: Add guard pages around a shadow stack. Date: Sun, 30 Jan 2022 13:18:21 -0800 Message-Id: <20220130211838.8382-19-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F07C140003 X-Rspam-User: nil Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Z16Y+Sd5; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf27.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: 9egm8q88tktfz4s11i75z7nhqgi57mr7 X-HE-Tag: 1643577720-842833 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu INCSSP(Q/D) increments shadow stack pointer and 'pops and discards' the first and the last elements in the range, effectively touches those memory areas. The maximum moving distance by INCSSPQ is 255 * 8 = 2040 bytes and 255 * 4 = 1020 bytes by INCSSPD. Both ranges are far from PAGE_SIZE. Thus, putting a gap page on both ends of a shadow stack prevents INCSSP, CALL, and RET from going beyond. Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Kees Cook --- Yu-cheng v25: - Move SHADOW_STACK_GUARD_GAP to arch/x86/mm/mmap.c. Yu-cheng v24: - Instead changing vm_*_gap(), create x86-specific versions. arch/x86/include/asm/page_types.h | 7 +++++ arch/x86/mm/mmap.c | 46 +++++++++++++++++++++++++++++++ include/linux/mm.h | 4 +++ 3 files changed, 57 insertions(+) diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h index a506a411474d..e1533fdc08b4 100644 --- a/arch/x86/include/asm/page_types.h +++ b/arch/x86/include/asm/page_types.h @@ -73,6 +73,13 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn); extern void initmem_init(void); +#define vm_start_gap vm_start_gap +struct vm_area_struct; +extern unsigned long vm_start_gap(struct vm_area_struct *vma); + +#define vm_end_gap vm_end_gap +extern unsigned long vm_end_gap(struct vm_area_struct *vma); + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_PAGE_DEFS_H */ diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index f3f52c5e2fd6..81f9325084d3 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -250,3 +250,49 @@ bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) return false; return true; } + +/* + * Shadow stack pointer is moved by CALL, RET, and INCSSP(Q/D). INCSSPQ + * moves shadow stack pointer up to 255 * 8 = ~2 KB (~1KB for INCSSPD) and + * touches the first and the last element in the range, which triggers a + * page fault if the range is not in a shadow stack. Because of this, + * creating 4-KB guard pages around a shadow stack prevents these + * instructions from going beyond. + */ +#define SHADOW_STACK_GUARD_GAP PAGE_SIZE + +unsigned long vm_start_gap(struct vm_area_struct *vma) +{ + unsigned long vm_start = vma->vm_start; + unsigned long gap = 0; + + if (vma->vm_flags & VM_GROWSDOWN) + gap = stack_guard_gap; + else if (vma->vm_flags & VM_SHADOW_STACK) + gap = SHADOW_STACK_GUARD_GAP; + + if (gap != 0) { + vm_start -= gap; + if (vm_start > vma->vm_start) + vm_start = 0; + } + return vm_start; +} + +unsigned long vm_end_gap(struct vm_area_struct *vma) +{ + unsigned long vm_end = vma->vm_end; + unsigned long gap = 0; + + if (vma->vm_flags & VM_GROWSUP) + gap = stack_guard_gap; + else if (vma->vm_flags & VM_SHADOW_STACK) + gap = SHADOW_STACK_GUARD_GAP; + + if (gap != 0) { + vm_end += gap; + if (vm_end < vma->vm_end) + vm_end = -PAGE_SIZE; + } + return vm_end; +} diff --git a/include/linux/mm.h b/include/linux/mm.h index b3cb3a17037b..e125358d7f75 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2797,6 +2797,7 @@ struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) return vma; } +#ifndef vm_start_gap static inline unsigned long vm_start_gap(struct vm_area_struct *vma) { unsigned long vm_start = vma->vm_start; @@ -2808,7 +2809,9 @@ static inline unsigned long vm_start_gap(struct vm_area_struct *vma) } return vm_start; } +#endif +#ifndef vm_end_gap static inline unsigned long vm_end_gap(struct vm_area_struct *vma) { unsigned long vm_end = vma->vm_end; @@ -2820,6 +2823,7 @@ static inline unsigned long vm_end_gap(struct vm_area_struct *vma) } return vm_end; } +#endif static inline unsigned long vma_pages(struct vm_area_struct *vma) { From patchwork Sun Jan 30 21:18:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D0C6C4167B for ; Sun, 30 Jan 2022 21:22:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E5456B009A; Sun, 30 Jan 2022 16:22:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E03B36B009B; Sun, 30 Jan 2022 16:22:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2A966B009C; Sun, 30 Jan 2022 16:22:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id B07646B009A for ; Sun, 30 Jan 2022 16:22:02 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 727AF96765 for ; Sun, 30 Jan 2022 21:22:02 +0000 (UTC) X-FDA: 79088226084.31.09B78FA Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf18.hostedemail.com (Postfix) with ESMTP id B6E281C0002 for ; Sun, 30 Jan 2022 21:22:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577721; x=1675113721; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Xw4flY9LjZeFsLI8M4MTW65ZEVTonPqvM2qNAILyuVU=; b=k5koN43E0aOgoAdkPM6Pxu8OsE05sCai9LFWt0Y3PZkrus1wT+TCpnN+ 9v7gjuRIkugSefYcs4ztM6BN3bYbwmAGsXLIUBNfe1bX1fznAkXQ+scdf asDlD/Iaq2We+XXhLw+xPBAZXevB2++EnCs4IKA5zNMVjpFlZNhkdUPl0 Y73bwAGkWUgycx7uPxg1VfNw9YvFkPH+CjqLvqJ0Qz8ZuI3l4oALGvAdt eSn4252ef2kzI/r1O2Y78X0s2sMV6X2pteefwWKjJ+Glzb8Tp1qHF1yXd IFIw6Bs+ZB2RKXHGYMFYxF3nGql6wa04KufdWlbieDOhiT6PE1d2YJRab A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970216" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970216" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:01 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856820" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:00 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 19/35] mm/mmap: Add shadow stack pages to memory accounting Date: Sun, 30 Jan 2022 13:18:22 -0800 Message-Id: <20220130211838.8382-20-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: B6E281C0002 X-Stat-Signature: 9xuu5ndfmk8cifdafajpfkmo4x6fu9wo X-Rspam-User: nil Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=k5koN43E; spf=none (imf18.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577721-980640 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Account shadow stack pages to stack memory. Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Kees Cook --- Yu-cheng v26: - Remove redundant #ifdef CONFIG_MMU. Yu-cheng v25: - Remove #ifdef CONFIG_ARCH_HAS_SHADOW_STACK for is_shadow_stack_mapping(). Yu-cheng v24: - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping(). - Change VM_SHSTK to VM_SHADOW_STACK. arch/x86/include/asm/pgtable.h | 3 +++ arch/x86/mm/pgtable.c | 5 +++++ include/linux/pgtable.h | 8 ++++++++ mm/mmap.c | 5 +++++ 4 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 36166bdd0b98..55641498485c 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1666,6 +1666,9 @@ static inline bool arch_faults_on_old_pte(void) #define maybe_mkwrite maybe_mkwrite extern pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma); +#define is_shadow_stack_mapping is_shadow_stack_mapping +extern bool is_shadow_stack_mapping(vm_flags_t vm_flags); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PGTABLE_H */ diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index c22c8e9c37e8..61a364b9ae0a 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -884,3 +884,8 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) #endif /* CONFIG_X86_64 */ #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ + +bool is_shadow_stack_mapping(vm_flags_t vm_flags) +{ + return vm_flags & VM_SHADOW_STACK; +} diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index bc8713a76e03..21fdb1273571 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -911,6 +911,14 @@ static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, __ptep_modify_prot_commit(vma, addr, ptep, pte); } #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */ + +#ifndef is_shadow_stack_mapping +static inline bool is_shadow_stack_mapping(vm_flags_t vm_flags) +{ + return false; +} +#endif + #endif /* CONFIG_MMU */ /* diff --git a/mm/mmap.c b/mm/mmap.c index 1e8fdb0b51ed..9bab326332af 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1716,6 +1716,9 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags) if (file && is_file_hugepages(file)) return 0; + if (is_shadow_stack_mapping(vm_flags)) + return 1; + return (vm_flags & (VM_NORESERVE | VM_SHARED | VM_WRITE)) == VM_WRITE; } @@ -3345,6 +3348,8 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages) mm->stack_vm += npages; else if (is_data_mapping(flags)) mm->data_vm += npages; + else if (is_shadow_stack_mapping(flags)) + mm->stack_vm += npages; } static vm_fault_t special_mapping_fault(struct vm_fault *vmf); From patchwork Sun Jan 30 21:18:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730150 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD5BCC433F5 for ; Sun, 30 Jan 2022 21:22:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF5766B009B; Sun, 30 Jan 2022 16:22:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AA71D6B009C; Sun, 30 Jan 2022 16:22:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 771B96B009D; Sun, 30 Jan 2022 16:22:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) by kanga.kvack.org (Postfix) with ESMTP id 685746B009B for ; Sun, 30 Jan 2022 16:22:03 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 24EF895295 for ; Sun, 30 Jan 2022 21:22:03 +0000 (UTC) X-FDA: 79088226126.04.7B9182A Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP id 67A4740002 for ; Sun, 30 Jan 2022 21:22:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577722; x=1675113722; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=numCZyfka0OHXp22LK4zITWZGhKJw0SkYk203jreRnY=; b=dKgsIdou9tgrj5uUZAV9p0lRzNldcupa6JToSK73+KKmRDVsCwhuPD4x ht1qCOmPSTtIU5kAVKlH66urbMCPi8EbnmDVmhSAAJKBkYCb1sPPMbLoO lYMetGxiqHLhLpZoBKx/7Szjp8zHmIwsPU0wm585zdZMGGO8/6lwwISrv 5wn3fJzSj5kAlMfjzrw/zevN5eMV+mM01XLN+lUILc9UOfv7YIqBWSzkq voOS/Y179Rjqi3uo+mQOl3fpUjm76BTTAwxwjYU16KHB386fdffTschG6 SyTcWbLeFCBx3IRn+C3eUouBTM+OYhbWHZ3ecrSt7o0suEWRCS6mWRrcz w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970217" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970217" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:01 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856831" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:01 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 20/35] mm: Update can_follow_write_pte() for shadow stack Date: Sun, 30 Jan 2022 13:18:23 -0800 Message-Id: <20220130211838.8382-21-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 67A4740002 X-Rspam-User: nil Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dKgsIdou; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf27.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: 94i1o7nkf5hfrhq79utfgk85be17tw7q X-HE-Tag: 1643577722-15063 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Can_follow_write_pte() ensures a read-only page is COWed by checking the FOLL_COW flag, and uses pte_dirty() to validate the flag is still valid. Like a writable data page, a shadow stack page is writable, and becomes read-only during copy-on-write, but it is always dirty. Thus, in the can_follow_write_pte() check, it belongs to the writable page case and should be excluded from the read-only page pte_dirty() check. Apply the same changes to can_follow_write_pmd(). While at it, also split the long line into smaller ones. Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Kees Cook --- Yu-cheng v26: - Instead of passing vm_flags, pass down vma pointer to can_follow_write_*(). Yu-cheng v25: - Split long line into smaller ones. Yu-cheng v24: - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping(). mm/gup.c | 16 ++++++++++++---- mm/huge_memory.c | 16 ++++++++++++---- 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index f0af462ac1e2..95b7d1084c44 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -464,10 +464,18 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, * FOLL_FORCE can write to even unwritable pte's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) +static inline bool can_follow_write_pte(pte_t pte, unsigned int flags, + struct vm_area_struct *vma) { - return pte_write(pte) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pte_dirty(pte)); + if (pte_write(pte)) + return true; + if ((flags & (FOLL_FORCE | FOLL_COW)) != (FOLL_FORCE | FOLL_COW)) + return false; + if (!pte_dirty(pte)) + return false; + if (is_shadow_stack_mapping(vma->vm_flags)) + return false; + return true; } static struct page *follow_page_pte(struct vm_area_struct *vma, @@ -510,7 +518,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, } if ((flags & FOLL_NUMA) && pte_protnone(pte)) goto no_page; - if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { + if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags, vma)) { pte_unmap_unlock(ptep, ptl); return NULL; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3588e9fefbe0..1c7167e6f223 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1346,10 +1346,18 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) * FOLL_FORCE can write to even unwritable pmd's, but only * after we've gone through a COW cycle and they are dirty. */ -static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags) +static inline bool can_follow_write_pmd(pmd_t pmd, unsigned int flags, + struct vm_area_struct *vma) { - return pmd_write(pmd) || - ((flags & FOLL_FORCE) && (flags & FOLL_COW) && pmd_dirty(pmd)); + if (pmd_write(pmd)) + return true; + if ((flags & (FOLL_FORCE | FOLL_COW)) != (FOLL_FORCE | FOLL_COW)) + return false; + if (!pmd_dirty(pmd)) + return false; + if (is_shadow_stack_mapping(vma->vm_flags)) + return false; + return true; } struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, @@ -1362,7 +1370,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, assert_spin_locked(pmd_lockptr(mm, pmd)); - if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags)) + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, flags, vma)) goto out; /* Avoid dumping huge zero page */ From patchwork Sun Jan 30 21:18:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11F54C433FE for ; Sun, 30 Jan 2022 21:22:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 799626B009C; Sun, 30 Jan 2022 16:22:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 772286B009E; Sun, 30 Jan 2022 16:22:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E9546B009F; Sun, 30 Jan 2022 16:22:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0053.hostedemail.com [216.40.44.53]) by kanga.kvack.org (Postfix) with ESMTP id 4D9546B009C for ; Sun, 30 Jan 2022 16:22:04 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id ED92595295 for ; Sun, 30 Jan 2022 21:22:03 +0000 (UTC) X-FDA: 79088226126.26.8FCBC3B Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf18.hostedemail.com (Postfix) with ESMTP id 2925E1C0002 for ; Sun, 30 Jan 2022 21:22:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577723; x=1675113723; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Swb/A+mwAZL7amzdZG31LHVm/8xpjESoIsbIiYe+fFo=; b=MsjXHjjrWMPf7Q9Xl4LOjmepPQvdWfKEyAfgHSaP5e84NJlI8dimbLfY L6oTPt+2vDz4dmoFAlD+t5d78dRdEGYpDkjgvops8F8lnGDtZPoLvO6eL 6yRQ5N3aWYWcuOgFLY/Ar36xCu2+Vm295gAYXCgK/OjXz+nFx6/RgPWpH hRlu0I+z8LrcocxMFHi+fCZTMX5SjG8VJcU108GF1u+lcVoNMf+zlE0ZD Okqy+uJWP8iY9XbVkjuVNM9vz3s3Bml/K9AOzXj6nzatEHGplpAguQ5sZ HhbUxwyi8/c2XSbggAKZqmUmkgkLx/9zzrbHZWEUollf6u67odoBl7RiZ g==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970218" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970218" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:02 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856845" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:01 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 21/35] mm/mprotect: Exclude shadow stack from preserve_write Date: Sun, 30 Jan 2022 13:18:24 -0800 Message-Id: <20220130211838.8382-22-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2925E1C0002 X-Stat-Signature: 3xubf37qyedoghop6b1uxyaka96e4jzy X-Rspam-User: nil Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MsjXHjjr; spf=none (imf18.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577722-772649 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu In change_pte_range(), when a PTE is changed for prot_numa, _PAGE_RW is preserved to avoid the additional write fault after the NUMA hinting fault. However, pte_write() now includes both normal writable and shadow stack (RW=0, Dirty=1) PTEs, but the latter does not have _PAGE_RW and has no need to preserve it. Exclude shadow stack from preserve_write test, and apply the same change to change_huge_pmd(). Signed-off-by: Yu-cheng Yu Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe --- v25: - Move is_shadow_stack_mapping() to a separate line. v24: - Change arch_shadow_stack_mapping() to is_shadow_stack_mapping(). mm/huge_memory.c | 7 +++++++ mm/mprotect.c | 7 +++++++ 2 files changed, 14 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1c7167e6f223..01375e39b52b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1750,6 +1750,13 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, return 0; preserve_write = prot_numa && pmd_write(*pmd); + + /* + * Preserve only normal writable huge PMD, but not shadow + * stack (RW=0, Dirty=1). + */ + if (is_shadow_stack_mapping(vma->vm_flags)) + preserve_write = false; ret = 1; #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION diff --git a/mm/mprotect.c b/mm/mprotect.c index b0012c13a00e..faac710f0891 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -77,6 +77,13 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, pte_t ptent; bool preserve_write = prot_numa && pte_write(oldpte); + /* + * Preserve only normal writable PTE, but not shadow + * stack (RW=0, Dirty=1). + */ + if (is_shadow_stack_mapping(vma->vm_flags)) + preserve_write = false; + /* * Avoid trapping faults against the zero or KSM * pages. See similar comment in change_huge_pmd. From patchwork Sun Jan 30 21:18:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730152 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75A3DC433EF for ; Sun, 30 Jan 2022 21:22:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B54B6B009E; Sun, 30 Jan 2022 16:22:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4640B6B009F; Sun, 30 Jan 2022 16:22:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B6866B00A0; Sun, 30 Jan 2022 16:22:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0108.hostedemail.com [216.40.44.108]) by kanga.kvack.org (Postfix) with ESMTP id 043F26B009E for ; Sun, 30 Jan 2022 16:22:05 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BDC43181DF763 for ; Sun, 30 Jan 2022 21:22:04 +0000 (UTC) X-FDA: 79088226168.18.4F04365 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf27.hostedemail.com (Postfix) with ESMTP id F3BB340002 for ; Sun, 30 Jan 2022 21:22:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577724; x=1675113724; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=hiUiL5XEyP01F3eBPpddRbZ/NlSqoVCcrqhafbq6fac=; b=iB0jAcFlq+8JS0YZWEazPiOiMoGj5pmeXAqzLzsQUli8jQ06bIDNqJL3 I1pE1nytAnoJxCPJzeLtrwLgxCFfhTr1VvihUI6cqTzaU33HM4PuiFB5q OCKhrGH09LZztSN7+iDIslXiJiJufAlTOYypJRKDVnj0Mk+GghDAly+X8 IC/TLM89ca3liMkweALLSFkZ8xpogn0C12B3IV35EwTas7PsvoFU8U+8W MGm2EN1MmtJB8X3QSAMjMBNpJDKB0wHOqkA+iI/NMh8UOgVCYNrWh2uhz eX2ECrka36fdVzqUWH2IO+scG3Bn0R3ULuj1SkLRIJLq2IdHBA+U8311C w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970219" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970219" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:03 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856855" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:02 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH 22/35] x86/mm: Prevent VM_WRITE shadow stacks Date: Sun, 30 Jan 2022 13:18:25 -0800 Message-Id: <20220130211838.8382-23-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F3BB340002 X-Rspam-User: nil Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=iB0jAcFl; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf27.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: 8jmnj5dfxtj3xf9k7mtp5wdcqi7i9ics X-HE-Tag: 1643577723-147691 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack accesses are writes from handle_mm_fault() perspective. So to generate the correct PTE, maybe_mkwrite() will rely on the presence of VM_SHADOW_STACK or VM_WRITE in the vma. In future patches, when VM_SHADOW_STACK is actually creatable by userspace, a problem could happen if a user calls mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory. The code would then be confused in the event of shadow stack accesses, and create a writable PTE for a shadow stack access. Then the process would fault in a loop. Prevent this from happening by blocking this kind of memory (VM_WRITE and VM_SHADOW_STACK) from being created, instead of complicating the fault handler logic to handle it. Add an x86 arch_validate_flags() implementation to handle the check. Rename the uapi/asm/mman.h header guard to be able to use it for arch/x86/include/asm/mman.h where the arch_validate_flags() will be. Signed-off-by: Rick Edgecombe --- v1: - New patch. arch/x86/include/asm/mman.h | 21 +++++++++++++++++++++ arch/x86/include/uapi/asm/mman.h | 6 +++--- 2 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 arch/x86/include/asm/mman.h diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h new file mode 100644 index 000000000000..b44fe31deb3a --- /dev/null +++ b/arch/x86/include/asm/mman.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MMAN_H +#define _ASM_X86_MMAN_H + +#include +#include + +#ifdef CONFIG_X86_SHADOW_STACK +static inline bool arch_validate_flags(unsigned long vm_flags) +{ + if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE)) + return false; + + return true; +} + +#define arch_validate_flags(vm_flags) arch_validate_flags(vm_flags) + +#endif /* CONFIG_X86_SHADOW_STACK */ + +#endif /* _ASM_X86_MMAN_H */ diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index d4a8d0424bfb..9704e27c4d24 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#ifndef _ASM_X86_MMAN_H -#define _ASM_X86_MMAN_H +#ifndef _UAPI_ASM_X86_MMAN_H +#define _UAPI_ASM_X86_MMAN_H #define MAP_32BIT 0x40 /* only give out 32bit addresses */ @@ -28,4 +28,4 @@ #include -#endif /* _ASM_X86_MMAN_H */ +#endif /* _UAPI_ASM_X86_MMAN_H */ From patchwork Sun Jan 30 21:18:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB5FBC433F5 for ; Sun, 30 Jan 2022 21:22:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E49A6B00AE; Sun, 30 Jan 2022 16:22:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 01DBA6B00AF; Sun, 30 Jan 2022 16:22:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8E416B00B0; Sun, 30 Jan 2022 16:22:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id C491F6B00AE for ; Sun, 30 Jan 2022 16:22:06 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 7DB92181DF763 for ; Sun, 30 Jan 2022 21:22:06 +0000 (UTC) X-FDA: 79088226252.22.F120425 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf16.hostedemail.com (Postfix) with ESMTP id B4643180003 for ; Sun, 30 Jan 2022 21:22:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577725; x=1675113725; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=t3JrZFhF1CwcBiNV2/x8DFu6hUw9NbjzQecxrm091YQ=; b=Zi55VurW3ztXxfx50VbxmVVTH5a8vSlwIomTmC/ts/CuuipjysO2Z6N5 Yu6eJfcB+YQIidkypnIA5f6LDTIMzR7YATK6lfx2IPTpmc0UGddISw2Hr MzXRKa+YFL/Xx+ufa1/UHF+tt+0uF7WlHnicClSJryRiS5c9n6EK+MZxh EimdL97lcO0qvNhudvR6Qr2eQFurJhudc3S7u59vuDj7zG0uRSf9KvBVy 5ZVVLojAdlQaLh/A9VDuJFKobHy+nfGhYJVLj4oKpBdYqIVbbGjthxWCu IhO5GXuC75i38mbdd3vTJD4udn+unyf3afW+L8vYn/5f8TAnkeLx3X71H Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970221" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970221" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:04 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856868" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:03 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH 23/35] x86/fpu: Add helpers for modifying supervisor xstate Date: Sun, 30 Jan 2022 13:18:26 -0800 Message-Id: <20220130211838.8382-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Zi55VurW; spf=none (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: nil X-Rspamd-Queue-Id: B4643180003 X-Stat-Signature: 5nhqam5u58wdw7mda5ncim4yxexjdcoy X-Rspamd-Server: rspam12 X-HE-Tag: 1643577725-189265 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add helpers that can be used to modify supervisor xstate safely for the current task. State for supervisors xstate based features can be live and accesses via MSR's, or saved in memory in an xsave buffer. When the kernel needs to modify this state it needs to be sure to operate on it in the right place, so the modifications don't get clobbered. In the past supervisor xstate features have used get_xsave_addr() directly, and performed open coded logic handle operating on the saved state correctly. This has posed two problems: 1. It has logic that has been gotten wrong more than once. 2. To reduce code, less common path's are not optimized. Determination of which path's are less common is based on assumptions about far away code that could change. In addition, now that get_xsave_addr() is not available outside of the core fpu code, there isn't even a way for these supervisor features to modify the in memory state. To resolve these problems, add some helpers that encapsulate the correct logic to operate on the correct copy of the state. Map the MSR's to the struct field location in a case statements in __get_xsave_member(). Use the helpers like this, to write to either the MSR or saved state: void *xstate; xstate = start_update_xsave_msrs(XFEATURE_FOO); r = xsave_rdmsrl(state, MSR_IA32_FOO_1, &val) if (r) xsave_wrmsrl(state, MSR_IA32_FOO_2, FOO_ENABLE); end_update_xsave_msrs(); Signed-off-by: Rick Edgecombe --- v1: - New patch. arch/x86/include/asm/fpu/api.h | 5 ++ arch/x86/kernel/fpu/xstate.c | 134 +++++++++++++++++++++++++++++++++ 2 files changed, 139 insertions(+) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index c83b3020350a..6aec27984b62 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -165,4 +165,9 @@ static inline bool fpstate_is_confidential(struct fpu_guest *gfpu) struct task_struct; extern long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2); +void *start_update_xsave_msrs(int xfeature_nr); +void end_update_xsave_msrs(void); +int xsave_rdmsrl(void *state, unsigned int msr, unsigned long long *p); +int xsave_wrmsrl(void *state, u32 msr, u64 val); +int xsave_set_clear_bits_msrl(void *state, u32 msr, u64 set, u64 clear); #endif /* _ASM_X86_FPU_API_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 44397202762b..c5e20e0d0725 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1867,3 +1867,137 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, return 0; } #endif /* CONFIG_PROC_PID_ARCH_STATUS */ + +static u64 *__get_xsave_member(void *xstate, u32 msr) +{ + switch (msr) { + /* Currently there are no MSR's supported */ + default: + WARN_ONCE(1, "x86/fpu: unsupported xstate msr (%u)\n", msr); + return NULL; + } +} + +/* + * Return a pointer to the xstate for the feature if it should be used, or NULL + * if the MSRs should be written to directly. To do this safely, using the + * associated read/write helpers is required. + */ +void *start_update_xsave_msrs(int xfeature_nr) +{ + void *xstate; + + /* + * fpregs_lock() only disables preemption (mostly). So modifing state + * in an interrupt could screw up some in progress fpregs operation, + * but appear to work. Warn about it. + */ + WARN_ON_ONCE(!in_task()); + WARN_ON_ONCE(current->flags & PF_KTHREAD); + + fpregs_lock(); + + fpregs_assert_state_consistent(); + + /* + * If the registers don't need to be reloaded. Go ahead and operate on the + * registers. + */ + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) + return NULL; + + xstate = get_xsave_addr(¤t->thread.fpu.fpstate->regs.xsave, xfeature_nr); + + /* + * If regs are in the init state, they can't be retrieved from + * init_fpstate due to the init optimization, but are not nessarily + * zero. The only option is to restore to make everything live and + * operate on registers. This will clear TIF_NEED_FPU_LOAD. + * + * Otherwise, if not in the init state but TIF_NEED_FPU_LOAD is set, + * operate on the buffer. The registers will be restored before going + * to userspace in any case, but the task might get preempted before + * then, so this possibly saves an xsave. + */ + if (!xstate) + fpregs_restore_userregs(); + return xstate; +} + +void end_update_xsave_msrs(void) +{ + fpregs_unlock(); +} + +/* + * When TIF_NEED_FPU_LOAD is set and fpregs_state_valid() is true, the saved + * state and fp state match. In this case, the kernel has some good options - + * it can skip the restore before returning to userspace or it could skip + * an xsave if preempted before then. + * + * But if this correspondence is broken by either a write to the in-memory + * buffer or the registers, the kernel needs to be notified so it doesn't miss + * an xsave or restore. __xsave_msrl_prepare_write() peforms this check and + * notifies the kernel if needed. Use before writes only, to not take away + * the kernel's options when not required. + * + * If TIF_NEED_FPU_LOAD is set, then the logic in start_update_xsave_msrs() + * must have resulted in targeting the in-memory state, so invaliding the + * registers is the right thing to do. + */ +static void __xsave_msrl_prepare_write(void) +{ + if (test_thread_flag(TIF_NEED_FPU_LOAD) && + fpregs_state_valid(¤t->thread.fpu, smp_processor_id())) + __fpu_invalidate_fpregs_state(¤t->thread.fpu); +} + +int xsave_rdmsrl(void *xstate, unsigned int msr, unsigned long long *p) +{ + u64 *member_ptr; + + if (!xstate) + return rdmsrl_safe(msr, p); + + member_ptr = __get_xsave_member(xstate, msr); + if (!member_ptr) + return 1; + + *p = *member_ptr; + + return 0; +} + +int xsave_wrmsrl(void *xstate, u32 msr, u64 val) +{ + u64 *member_ptr; + + __xsave_msrl_prepare_write(); + if (!xstate) + return wrmsrl_safe(msr, val); + + member_ptr = __get_xsave_member(xstate, msr); + if (!member_ptr) + return 1; + + *member_ptr = val; + + return 0; +} + +int xsave_set_clear_bits_msrl(void *xstate, u32 msr, u64 set, u64 clear) +{ + u64 val, new_val; + int ret; + + ret = xsave_rdmsrl(xstate, msr, &val); + if (ret) + return ret; + + new_val = (val & ~clear) | set; + + if (new_val != val) + return xsave_wrmsrl(xstate, msr, new_val); + + return 0; +} From patchwork Sun Jan 30 21:18:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730154 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 635A8C433FE for ; Sun, 30 Jan 2022 21:22:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B69A66B00AF; Sun, 30 Jan 2022 16:22:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ACCBE6B00B1; Sun, 30 Jan 2022 16:22:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F9066B00B2; Sun, 30 Jan 2022 16:22:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id 7B2FC6B00AF for ; Sun, 30 Jan 2022 16:22:07 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 37D28181E7875 for ; Sun, 30 Jan 2022 21:22:07 +0000 (UTC) X-FDA: 79088226294.20.931F516 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf28.hostedemail.com (Postfix) with ESMTP id 6BECCC0002 for ; Sun, 30 Jan 2022 21:22:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577726; x=1675113726; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=IJv3bcwNym+ue5ES2C0XMwMTBK04UF1ldHp0VsVFE1E=; b=R5ikT0Q3jq5sb/BNGZJCZKce5qljZEqP3IPZpN3Xc2u9Gv0FCF3SO3Hl NozLwxRyPJMDY+MRu7rwiWPAd4EM1bYkVLzpGYS6VimldK5BXInqocT6X MXYAghyRBgb60hdV/Ks0GLjBYZzNvqDSzfYmQXJWqNVX6aJs8YCIi6GSq C3e1z88Ynuv7g8jgdOF/QAsY7ONNuM35SGf1BVklxxIj3m2b5DRMp2FYm JYcD8uYSSALVHQBUSWU5LuwpViyOtAVgOmxpkbGMn6doYYObQ9z+DJeeA mtSt8RhOy1vCPpVznBWohByu+OJfn1V05EFNIm8fh2XiDJZ6N7pFA+eXi w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970222" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970222" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:05 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856881" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:04 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu , Andrew Morton Subject: [PATCH 24/35] mm: Re-introduce vm_flags to do_mmap() Date: Sun, 30 Jan 2022 13:18:27 -0800 Message-Id: <20220130211838.8382-25-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 6BECCC0002 X-Stat-Signature: rajbg4ixkqiyk7wa3kprer3h3pz4o6ge X-Rspam-User: nil Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=R5ikT0Q3; spf=none (imf28.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577726-173758 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300f114 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Signed-off-by: Yu-cheng Yu Reviewed-by: Peter Collingbourne Reviewed-by: Kees Cook Reviewed-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Cc: Andrew Morton Cc: Oleg Nesterov Cc: linux-mm@kvack.org --- fs/aio.c | 2 +- include/linux/mm.h | 3 ++- ipc/shm.c | 2 +- mm/mmap.c | 10 +++++----- mm/nommu.c | 4 ++-- mm/util.c | 2 +- 6 files changed, 12 insertions(+), 11 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 4ceba13a7db0..a24618e0e3fc 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -554,7 +554,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED, 0, &unused, NULL); + MAP_SHARED, 0, 0, &unused, NULL); mmap_write_unlock(mm); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index e125358d7f75..481e1271409f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2689,7 +2689,8 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, struct list_head *uf); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, - unsigned long pgoff, unsigned long *populate, struct list_head *uf); + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, + struct list_head *uf); extern int __do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf, bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, diff --git a/ipc/shm.c b/ipc/shm.c index b3048ebd5c31..f236b3e14ec4 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1646,7 +1646,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, goto invalid; } - addr = do_mmap(file, addr, size, prot, flags, 0, &populate, NULL); + addr = do_mmap(file, addr, size, prot, flags, 0, 0, &populate, NULL); *raddr = addr; err = 0; if (IS_ERR_VALUE(addr)) diff --git a/mm/mmap.c b/mm/mmap.c index 9bab326332af..9c82a1b02cfc 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1410,11 +1410,11 @@ static inline bool file_mmap_ok(struct file *file, struct inode *inode, */ unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, - unsigned long flags, unsigned long pgoff, - unsigned long *populate, struct list_head *uf) + unsigned long flags, vm_flags_t vm_flags, + unsigned long pgoff, unsigned long *populate, + struct list_head *uf) { struct mm_struct *mm = current->mm; - vm_flags_t vm_flags; int pkey = 0; *populate = 0; @@ -1474,7 +1474,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags = calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; if (flags & MAP_LOCKED) @@ -3011,7 +3011,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, file = get_file(vma->vm_file); ret = do_mmap(vma->vm_file, start, size, - prot, flags, pgoff, &populate, NULL); + prot, flags, 0, pgoff, &populate, NULL); fput(file); out: mmap_write_unlock(mm); diff --git a/mm/nommu.c b/mm/nommu.c index 55a9e48a7a02..a6e0243cd69b 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1057,6 +1057,7 @@ unsigned long do_mmap(struct file *file, unsigned long len, unsigned long prot, unsigned long flags, + vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf) @@ -1064,7 +1065,6 @@ unsigned long do_mmap(struct file *file, struct vm_area_struct *vma; struct vm_region *region; struct rb_node *rb; - vm_flags_t vm_flags; unsigned long capabilities, result; int ret; @@ -1083,7 +1083,7 @@ unsigned long do_mmap(struct file *file, /* we've determined that we can make the mapping, now translate what we * now know into VMA flags */ - vm_flags = determine_vm_flags(file, prot, flags, capabilities); + vm_flags |= determine_vm_flags(file, prot, flags, capabilities); /* we're going to need to record the mapping */ region = kmem_cache_zalloc(vm_region_jar, GFP_KERNEL); diff --git a/mm/util.c b/mm/util.c index 7e43369064c8..d419821364cc 100644 --- a/mm/util.c +++ b/mm/util.c @@ -516,7 +516,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr, if (!ret) { if (mmap_write_lock_killable(mm)) return -EINTR; - ret = do_mmap(file, addr, len, prot, flag, pgoff, &populate, + ret = do_mmap(file, addr, len, prot, flag, 0, pgoff, &populate, &uf); mmap_write_unlock(mm); userfaultfd_unmap_complete(mm, &uf); From patchwork Sun Jan 30 21:18:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730155 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0052C433EF for ; Sun, 30 Jan 2022 21:22:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8D1C66B00B1; Sun, 30 Jan 2022 16:22:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A9786B00B3; Sun, 30 Jan 2022 16:22:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7222B6B00B4; Sun, 30 Jan 2022 16:22:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0053.hostedemail.com [216.40.44.53]) by kanga.kvack.org (Postfix) with ESMTP id 5E2446B00B1 for ; Sun, 30 Jan 2022 16:22:08 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 2B39696765 for ; Sun, 30 Jan 2022 21:22:08 +0000 (UTC) X-FDA: 79088226336.09.4941C6C Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf16.hostedemail.com (Postfix) with ESMTP id 3246D180004 for ; Sun, 30 Jan 2022 21:22:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577727; x=1675113727; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=nKIQtVTcJcJc4kRj9os/sAR9Nug8XbfOG4wi9emXZyE=; b=JJwhC8mmGVBB8WP6EbhCBbm2hpGUna8V3gwIdSQGWwee8xJQjxRfw29T t5Lj1ZZ+/nZ22WZ9lpUoRrbaYwJDau8z6o6dME5BIQ4VpOL2PXmQuLEog ihGgd2TO1u5vegBkqrP0C4k1fOnOfFwlUK5EimF9+EcS+dZvy3WOE3Ccv V5WwKSXBCtTD5WVpMl48XuFOhvwyVvHrqp3AZOm0eCqWGQWOThuydfZ/O zOgrmnT8YSfX6WhpSonSGfayzRF5cs3tVYlXm75zO28ytZhwHL3xFXJYb ToFZoXDmkW3fLJ5Ra8MjojBJuELmNmEdkw8uyo0R5VLp9GR9ZAmDNFo1L w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970224" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970224" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:06 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856894" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:05 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 25/35] x86/cet/shstk: Add user-mode shadow stack support Date: Sun, 30 Jan 2022 13:18:28 -0800 Message-Id: <20220130211838.8382-26-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JJwhC8mm; spf=none (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: nil X-Rspamd-Queue-Id: 3246D180004 X-Stat-Signature: 3r6rjg76ub3p3a6tcxcnkzfcfn451j1f X-Rspamd-Server: rspam12 X-HE-Tag: 1643577727-280484 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Introduce basic shadow stack enabling/disabling/allocation routines. A task's shadow stack is allocated from memory with VM_SHADOW_STACK flag and has a fixed size of min(RLIMIT_STACK, 4GB). Add the user shadow stack MSRs to the xsave helpers, so they can be used to implement the functionality. Keep the task's shadow stack address and size in thread_struct. This will be copied when cloning new threads, but needs to be cleared during exec, so add a function to do this. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v1: - Switch to xsave helpers. - Expand commit log. Yu-cheng v30: - Remove superfluous comments for struct thread_shstk. - Replace 'populate' with 'unused'. Yu-cheng v28: - Update shstk_setup() with wrmsrl_safe(), returns success when shadow stack feature is not present (since this is a setup function). Yu-cheng v27: - Change 'struct cet_status' to 'struct thread_shstk', and change member types from unsigned long to u64. - Re-order local variables in reverse order of length. - WARN_ON_ONCE() when vm_munmap() fails. arch/x86/include/asm/cet.h | 29 ++++++ arch/x86/include/asm/processor.h | 5 ++ arch/x86/kernel/Makefile | 1 + arch/x86/kernel/fpu/xstate.c | 5 +- arch/x86/kernel/process_64.c | 2 + arch/x86/kernel/shstk.c | 149 +++++++++++++++++++++++++++++++ 6 files changed, 190 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/cet.h create mode 100644 arch/x86/kernel/shstk.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h new file mode 100644 index 000000000000..de90e4ae083a --- /dev/null +++ b/arch/x86/include/asm/cet.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_CET_H +#define _ASM_X86_CET_H + +#ifndef __ASSEMBLY__ +#include + +struct task_struct; + +struct thread_shstk { + u64 base; + u64 size; +}; + +#ifdef CONFIG_X86_SHADOW_STACK +int shstk_setup(void); +void shstk_free(struct task_struct *p); +int shstk_disable(void); +void reset_thread_shstk(void); +#else +static inline void shstk_setup(void) {} +static inline void shstk_free(struct task_struct *p) {} +static inline void shstk_disable(void) {} +static inline void reset_thread_shstk(void) {} +#endif /* CONFIG_X86_SHADOW_STACK */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_X86_CET_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 2c5f12ae7d04..a9f4e9c4ca81 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -27,6 +27,7 @@ struct vm86; #include #include #include +#include #include #include @@ -528,6 +529,10 @@ struct thread_struct { */ u32 pkru; +#ifdef CONFIG_X86_SHADOW_STACK + struct thread_shstk shstk; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 6aef9ee28a39..d60ae6c365c7 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -153,6 +153,7 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT) += sev.o obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += cc_platform.o +obj-$(CONFIG_X86_SHADOW_STACK) += shstk.o ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index c5e20e0d0725..25b1b0c417fd 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1871,7 +1871,10 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, static u64 *__get_xsave_member(void *xstate, u32 msr) { switch (msr) { - /* Currently there are no MSR's supported */ + case MSR_IA32_PL3_SSP: + return &((struct cet_user_state *)xstate)->user_ssp; + case MSR_IA32_U_CET: + return &((struct cet_user_state *)xstate)->user_cet; default: WARN_ONCE(1, "x86/fpu: unsupported xstate msr (%u)\n", msr); return NULL; diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 3402edec236c..f05fe27d4967 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -514,6 +514,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip, load_gs_index(__USER_DS); } + reset_thread_shstk(); + loadsegment(fs, 0); loadsegment(es, _ds); loadsegment(ds, _ds); diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c new file mode 100644 index 000000000000..4e8686ed885f --- /dev/null +++ b/arch/x86/kernel/shstk.c @@ -0,0 +1,149 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * shstk.c - Intel shadow stack support + * + * Copyright (c) 2021, Intel Corporation. + * Yu-cheng Yu + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static unsigned long alloc_shstk(unsigned long size) +{ + int flags = MAP_ANONYMOUS | MAP_PRIVATE; + struct mm_struct *mm = current->mm; + unsigned long addr, unused; + + mmap_write_lock(mm); + addr = do_mmap(NULL, 0, size, PROT_READ, flags, VM_SHADOW_STACK, 0, + &unused, NULL); + mmap_write_unlock(mm); + + return addr; +} + +static void unmap_shadow_stack(u64 base, u64 size) +{ + while (1) { + int r; + + r = vm_munmap(base, size); + + /* + * vm_munmap() returns -EINTR when mmap_lock is held by + * something else, and that lock should not be held for a + * long time. Retry it for the case. + */ + if (r == -EINTR) { + cond_resched(); + continue; + } + + /* + * For all other types of vm_munmap() failure, either the + * system is out of memory or there is bug. + */ + WARN_ON_ONCE(r); + break; + } +} + +int shstk_setup(void) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + unsigned long addr, size; + void *xstate; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || + shstk->size || + shstk->base) + return 1; + + size = PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); + addr = alloc_shstk(size); + if (IS_ERR_VALUE(addr)) + return 1; + + xstate = start_update_xsave_msrs(XFEATURE_CET_USER); + err = xsave_wrmsrl(xstate, MSR_IA32_PL3_SSP, addr + size); + if (!err) + err = xsave_wrmsrl(xstate, MSR_IA32_U_CET, CET_SHSTK_EN); + end_update_xsave_msrs(); + + if (err) { + /* + * Don't leak shadow stack if something went wrong with writing the + * msrs. Warn about it because things may be in a weird state. + */ + WARN_ON_ONCE(1); + unmap_shadow_stack(addr, size); + return 1; + } + + shstk->base = addr; + shstk->size = size; + return 0; +} + +void reset_thread_shstk(void) +{ + memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); +} + +void shstk_free(struct task_struct *tsk) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || + !shstk->size || + !shstk->base) + return; + + if (!tsk->mm) + return; + + unmap_shadow_stack(shstk->base, shstk->size); + + shstk->base = 0; + shstk->size = 0; +} + +int shstk_disable(void) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + void *xstate; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || + !shstk->size || + !shstk->base) + return 1; + + xstate = start_update_xsave_msrs(XFEATURE_CET_USER); + err = xsave_set_clear_bits_msrl(xstate, MSR_IA32_U_CET, 0, CET_SHSTK_EN); + if (!err) + err = xsave_wrmsrl(xstate, MSR_IA32_PL3_SSP, 0); + end_update_xsave_msrs(); + + if (err) + return 1; + + shstk_free(current); + return 0; +} From patchwork Sun Jan 30 21:18:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730156 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E544C433F5 for ; Sun, 30 Jan 2022 21:22:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 496896B00B3; Sun, 30 Jan 2022 16:22:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C7C16B00B5; Sun, 30 Jan 2022 16:22:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 266FC6B00B6; Sun, 30 Jan 2022 16:22:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0153.hostedemail.com [216.40.44.153]) by kanga.kvack.org (Postfix) with ESMTP id 16F3F6B00B3 for ; Sun, 30 Jan 2022 16:22:09 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D231695295 for ; Sun, 30 Jan 2022 21:22:08 +0000 (UTC) X-FDA: 79088226336.07.58C3D74 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf28.hostedemail.com (Postfix) with ESMTP id 10254C0002 for ; Sun, 30 Jan 2022 21:22:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577728; x=1675113728; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=iszGpgXYExAyyOaulmbapPJ8Xs6AdZ3T2R4pfVKChQc=; b=hXyd0neO3hdl3ZE5Fotfgria6uzKq1yjJ8v3215nBW7/51TVLaF6G7VE UImKTXr0+H97+daxMlrPkptNcuuY7zzl32R1pzv6YQz540F+46+93DHkK A8f4AL188D9Gz42ggs+J8E7sILJaSccT1tEfuePXDCL4xsa8xMWVmOLEO QiyBi/aTk8fmYtB7YNea8QYqE1F6GjW5SDDP0O+OYpBGy5cvvxA9btPSw coWi6/2DFRppB0A5HrC5lba/Kv+jK/7Y2nLZjduDHaa5B8T6++2utAai6 uDsT6mo477B4TkPQyK7KuzWB+1fADnydgZ8LCCMTTf3GwgvPQF0WefYYd Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970225" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970225" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:07 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856908" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:06 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 26/35] x86/process: Change copy_thread() argument 'arg' to 'stack_size' Date: Sun, 30 Jan 2022 13:18:29 -0800 Message-Id: <20220130211838.8382-27-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 10254C0002 X-Stat-Signature: fz8a4jfxsxkkzmbk4mcbf9caam1cemfz X-Rspam-User: nil Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hXyd0neO; spf=none (imf28.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577727-628565 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu The single call site of copy_thread() passes stack size in 'arg'. To make this clear and in preparation of using this argument for shadow stack allocation, change 'arg' to 'stack_size'. No functional changes. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe --- arch/x86/kernel/process.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 81d8ef036637..82a816178e7f 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -130,8 +130,9 @@ static int set_new_tls(struct task_struct *p, unsigned long tls) return do_set_thread_area_64(p, ARCH_SET_FS, tls); } -int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, - struct task_struct *p, unsigned long tls) +int copy_thread(unsigned long clone_flags, unsigned long sp, + unsigned long stack_size, struct task_struct *p, + unsigned long tls) { struct inactive_task_frame *frame; struct fork_frame *fork_frame; @@ -175,7 +176,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, if (unlikely(p->flags & PF_KTHREAD)) { p->thread.pkru = pkru_get_init_value(); memset(childregs, 0, sizeof(struct pt_regs)); - kthread_frame_init(frame, sp, arg); + kthread_frame_init(frame, sp, stack_size); return 0; } @@ -208,7 +209,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, */ childregs->sp = 0; childregs->ip = 0; - kthread_frame_init(frame, sp, arg); + kthread_frame_init(frame, sp, stack_size); return 0; } From patchwork Sun Jan 30 21:18:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730157 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB90DC433EF for ; Sun, 30 Jan 2022 21:22:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B90576B00B5; Sun, 30 Jan 2022 16:22:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B2D3E6B00B7; Sun, 30 Jan 2022 16:22:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 992636B00B8; Sun, 30 Jan 2022 16:22:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0178.hostedemail.com [216.40.44.178]) by kanga.kvack.org (Postfix) with ESMTP id 8515E6B00B5 for ; Sun, 30 Jan 2022 16:22:09 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 43CC695295 for ; Sun, 30 Jan 2022 21:22:09 +0000 (UTC) X-FDA: 79088226378.15.07CC77A Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf16.hostedemail.com (Postfix) with ESMTP id A1899180004 for ; Sun, 30 Jan 2022 21:22:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577728; x=1675113728; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=AKky5ODZPWXr+Y60awjhLEp8vC5Ov5UFi7xn02fKIeA=; b=BS1SE8yxQbMUjI+hN9oxic1spgFdmETJ0HnV6S3nkuNriOvmgIgovzP4 gDaWJcQaEroJtv5rNlTPjBMXLBSahkTmRE5hRnC/C39uyjv5vaIMc1sMK ZhN8tsauSnVHpS4i/dMO9sqyClOINlsvijEpiv1ZRHaRxNzypVDZ++jvk u5jXwMqK/dxT9UXvsP2zDGcrAvRZGSQO8EI+hPA8dINR72T7TsVvhyxJe kTi57jqAb6vbbsqbEQyO1xwXGHNUvdBaih81I/tbwOsyR8LiY3RnqZLli paf2rFKJDbQgYDis6Z1Y2zhYuPOCEMrEH7CeDdL6P1jhMI1SMR9Z0QH++ Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970226" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970226" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:08 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856924" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:07 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH 27/35] x86/fpu: Add unsafe xsave buffer helpers Date: Sun, 30 Jan 2022 13:18:30 -0800 Message-Id: <20220130211838.8382-28-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BS1SE8yx; spf=none (imf16.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: nil X-Rspamd-Queue-Id: A1899180004 X-Stat-Signature: zufjex1xqr5odwweqe9b6rsua93zat5p X-Rspamd-Server: rspam12 X-HE-Tag: 1643577728-482436 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: CET will need to modify the xsave buffer of a new FPU that was just created in the process of copying a thread. In this case the normal helpers will not work, because they operate on the current thread's FPU. So add unsafe helpers to allow for this kind of modification. Make the unsafe helpers operate on the MSR like the safe helpers for symmetry and to avoid exposing the underling xsave structures. Don't add a read helper because it is not needed at this time. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/fpu/api.h | 9 ++++++--- arch/x86/kernel/fpu/xstate.c | 27 ++++++++++++++++++++++----- 2 files changed, 28 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 6aec27984b62..5cb557b9d118 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -167,7 +167,10 @@ extern long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long void *start_update_xsave_msrs(int xfeature_nr); void end_update_xsave_msrs(void); -int xsave_rdmsrl(void *state, unsigned int msr, unsigned long long *p); -int xsave_wrmsrl(void *state, u32 msr, u64 val); -int xsave_set_clear_bits_msrl(void *state, u32 msr, u64 set, u64 clear); +int xsave_rdmsrl(void *xstate, unsigned int msr, unsigned long long *p); +int xsave_wrmsrl(void *xstate, u32 msr, u64 val); +int xsave_set_clear_bits_msrl(void *xstate, u32 msr, u64 set, u64 clear); + +void *get_xsave_buffer_unsafe(struct fpu *fpu, int xfeature_nr); +int xsave_wrmsrl_unsafe(void *xstate, u32 msr, u64 val); #endif /* _ASM_X86_FPU_API_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 25b1b0c417fd..71b08026474c 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1881,6 +1881,17 @@ static u64 *__get_xsave_member(void *xstate, u32 msr) } } +/* + * Operate on the xsave buffer directly. It makes no gaurantees that the + * buffer will stay valid now or in the futre. This function is pretty + * much only useful when the caller knows the fpu's thread can't be + * scheduled or otherwise operated on concurrently. + */ +void *get_xsave_buffer_unsafe(struct fpu *fpu, int xfeature_nr) +{ + return get_xsave_addr(&fpu->fpstate->regs.xsave, xfeature_nr); +} + /* * Return a pointer to the xstate for the feature if it should be used, or NULL * if the MSRs should be written to directly. To do this safely, using the @@ -1971,14 +1982,11 @@ int xsave_rdmsrl(void *xstate, unsigned int msr, unsigned long long *p) return 0; } -int xsave_wrmsrl(void *xstate, u32 msr, u64 val) + +int xsave_wrmsrl_unsafe(void *xstate, u32 msr, u64 val) { u64 *member_ptr; - __xsave_msrl_prepare_write(); - if (!xstate) - return wrmsrl_safe(msr, val); - member_ptr = __get_xsave_member(xstate, msr); if (!member_ptr) return 1; @@ -1988,6 +1996,15 @@ int xsave_wrmsrl(void *xstate, u32 msr, u64 val) return 0; } +int xsave_wrmsrl(void *xstate, u32 msr, u64 val) +{ + __xsave_msrl_prepare_write(); + if (!xstate) + return wrmsrl_safe(msr, val); + + return xsave_wrmsrl_unsafe(xstate, msr, val); +} + int xsave_set_clear_bits_msrl(void *xstate, u32 msr, u64 set, u64 clear) { u64 val, new_val; From patchwork Sun Jan 30 21:18:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730158 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB7AAC433F5 for ; Sun, 30 Jan 2022 21:22:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 885876B00B7; Sun, 30 Jan 2022 16:22:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BEE66B00B9; Sun, 30 Jan 2022 16:22:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 638536B00BA; Sun, 30 Jan 2022 16:22:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87]) by kanga.kvack.org (Postfix) with ESMTP id 4C7D96B00B7 for ; Sun, 30 Jan 2022 16:22:11 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0973E8249980 for ; Sun, 30 Jan 2022 21:22:11 +0000 (UTC) X-FDA: 79088226462.06.5B28CF4 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf09.hostedemail.com (Postfix) with ESMTP id 6C7A5140004 for ; Sun, 30 Jan 2022 21:22:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577730; x=1675113730; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=fNETiia0npgO2OFNbSIMbUM/Mgq6q17fU9ocx82tnFk=; b=lt6nNyltGZvz1zme91z8gpYtKuPpFTQZn/YdMHlM1nYIJDkCQD+GjI54 YURC5yzbQ36Hr17XpSbbPY4zaajFydn78SVlDt1ZJOgo+YSNfxoTAjrPE iHsPL72Uri4eFFgcfvVbEQXWnKKc86+QhmXweB2kMXcNHn7D6bv3FM4MG 3CuPuat8JROjlytEj6RywC7/s1t4h5Fh/cKNLXA54YlxCy6t2vxIDwYfP UWcO9KG1qLLoUVu53MqQ7utKI2EgdCmiKa46yTjn4DT0ZNNlxPRj1sRIY Rh7/Ji6AEHU6XUZ2gHdftyShr51Nu0e/A6ofZ3zFF5nAAN17AC7ge/qEF A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970227" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970227" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:09 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856940" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:08 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 28/35] x86/cet/shstk: Handle thread shadow stack Date: Sun, 30 Jan 2022 13:18:31 -0800 Message-Id: <20220130211838.8382-29-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lt6nNylt; spf=none (imf09.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: nil X-Rspamd-Queue-Id: 6C7A5140004 X-Stat-Signature: xzq8ut9onx7p8n63nfiogbdymu384puc X-Rspamd-Server: rspam12 X-HE-Tag: 1643577730-634729 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-cet case this is handled in two ways. With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks. For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack. For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it. Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB. For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork(). Userspace implementing posix vfork() can actually prevent the child from returning from the vfork() calling function, using CET. Glibc does this by adjusting the shadow stack pointer in the child, so that the child receives a #CP if it tries to return from vfork() calling function. Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe --- v1: - Expand commit log. - Add more comments. - Switch to xsave helpers. Yu-cheng v30: - Update comments about clone()/clone3(). (Borislav Petkov) Yu-cheng v29: - WARN_ON_ONCE() when get_xsave_addr() returns NULL, and update comments. (Dave Hansen) Yu-cheng v28: - Split out copy_thread() argument name changes to a new patch. - Add compatibility for earlier clone(), which does not pass stack_size. - Add comment for get_xsave_addr(), explain the handling of null return value. arch/x86/include/asm/cet.h | 5 +++ arch/x86/include/asm/mmu_context.h | 2 + arch/x86/kernel/process.c | 6 +++ arch/x86/kernel/shstk.c | 68 +++++++++++++++++++++++++++++- 4 files changed, 80 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index de90e4ae083a..63ee8b45080d 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -14,11 +14,16 @@ struct thread_shstk { #ifdef CONFIG_X86_SHADOW_STACK int shstk_setup(void); +int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, + unsigned long stack_size); void shstk_free(struct task_struct *p); int shstk_disable(void); void reset_thread_shstk(void); #else static inline void shstk_setup(void) {} +static inline int shstk_alloc_thread_stack(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size) { return 0; } static inline void shstk_free(struct task_struct *p) {} static inline void shstk_disable(void) {} static inline void reset_thread_shstk(void) {} diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 27516046117a..8e721d2c45d5 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -146,6 +146,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 82a816178e7f..0fbcf33255fa 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -46,6 +46,7 @@ #include #include #include +#include #include "process.h" @@ -117,6 +118,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + shstk_free(tsk); fpu__drop(fpu); } @@ -217,6 +219,10 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, if (clone_flags & CLONE_SETTLS) ret = set_new_tls(p, tls); + /* Allocate a new shadow stack for pthread */ + if (!ret) + ret = shstk_alloc_thread_stack(p, clone_flags, stack_size); + if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 4e8686ed885f..358f24e806cc 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -106,6 +106,66 @@ void reset_thread_shstk(void) memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); } +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, + unsigned long stack_size) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + unsigned long addr; + void *xstate; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (!shstk->size) + return 0; + + /* + * clone() does not pass stack_size, which was added to clone3(). + * Use RLIMIT_STACK and cap to 4 GB. + */ + if (!stack_size) + stack_size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G); + + /* + * For CLONE_VM, except vfork, the child needs a separate shadow + * stack. + */ + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + + /* + * Compat-mode pthreads share a limited address space. + * If each function call takes an average of four slots + * stack space, allocate 1/4 of stack size for shadow stack. + */ + if (in_compat_syscall()) + stack_size /= 4; + + /* + * 'tsk' is configured with a shadow stack and the fpu.state is + * up to date since it was just copied from the parent. There + * must be a valid non-init CET state location in the buffer. + */ + xstate = get_xsave_buffer_unsafe(&tsk->thread.fpu, XFEATURE_CET_USER); + if (WARN_ON_ONCE(!xstate)) + return -EINVAL; + + stack_size = PAGE_ALIGN(stack_size); + addr = alloc_shstk(stack_size); + if (IS_ERR_VALUE(addr)) { + shstk->base = 0; + shstk->size = 0; + return PTR_ERR((void *)addr); + } + + xsave_wrmsrl_unsafe(xstate, MSR_IA32_PL3_SSP, (u64)(addr + stack_size)); + shstk->base = addr; + shstk->size = stack_size; + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; @@ -115,7 +175,13 @@ void shstk_free(struct task_struct *tsk) !shstk->base) return; - if (!tsk->mm) + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. + */ + if (!tsk->mm || tsk->mm != current->mm) return; unmap_shadow_stack(shstk->base, shstk->size); From patchwork Sun Jan 30 21:18:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730159 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64620C433FE for ; Sun, 30 Jan 2022 21:22:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 527956B00B9; Sun, 30 Jan 2022 16:22:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B20A6B00BB; Sun, 30 Jan 2022 16:22:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3533C6B00BC; Sun, 30 Jan 2022 16:22:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0245.hostedemail.com [216.40.44.245]) by kanga.kvack.org (Postfix) with ESMTP id 258CA6B00B9 for ; Sun, 30 Jan 2022 16:22:12 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D6B819791D for ; Sun, 30 Jan 2022 21:22:11 +0000 (UTC) X-FDA: 79088226462.21.536E2FD Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf09.hostedemail.com (Postfix) with ESMTP id 3769B140004 for ; Sun, 30 Jan 2022 21:22:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577731; x=1675113731; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=EqRq2q1dVEJOlLae8vGVwGnmb22Q/jCe3uza6jx7E4E=; b=CwNqrlerasQXL3/HhQqXv5lQPZGyrULrQVaQX/B41214Oh8Mmb5nNE7J zN3QX+xwLWleifmzNzrqpUkTysRxFqMPKynPDt9LGC/vBMyo86hla+R5G Gs859OVqT7/Hb5rmG0NPZbNYynbGIiCPQbLted70vTs3wa5zwb5Wn8e+U eliSitMNypv1lreYGDptC9ID7bIcxNwXC9U+GTYk0vEk6dP1XtR6H9x8y f79OcMRKQLxzv0Gr5LFvy0F08wlVCJRUi31SCi8AZIymZCmEqAKyVMyZc FQUdTucqWhvxzVgCw6X0yoZzjO6a+0656+cuCYWJKP3khO/i5DL6yvVzB A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970229" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970229" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:10 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856960" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:09 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 29/35] x86/cet/shstk: Introduce shadow stack token setup/verify routines Date: Sun, 30 Jan 2022 13:18:32 -0800 Message-Id: <20220130211838.8382-30-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=CwNqrler; spf=none (imf09.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: nil X-Rspamd-Queue-Id: 3769B140004 X-Stat-Signature: d5iqx9rp73dngkz6nf3smyxtzcihxz9x X-Rspamd-Server: rspam12 X-HE-Tag: 1643577731-574223 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu A shadow stack restore token marks a restore point of the shadow stack, and the address in a token must point directly above the token, which is within the same shadow stack. This is distinctively different from other pointers on the shadow stack, since those pointers point to executable code area. Introduce token setup and verify routines. Also introduce WRUSS, which is a kernel-mode instruction but writes directly to user shadow stack. It is used to construct user signal stack as described above. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v1: - Use xsave helpers. Yu-cheng v30: - Update commit log, remove description about signals. - Update various comments. - Remove variable 'ssp' init and adjust return value accordingly. - Check get_user_shstk_addr() return value. - Replace 'ia32' with 'proc32'. Yu-cheng v29: - Update comments for the use of get_xsave_addr(). Yu-cheng v28: - Add comments for get_xsave_addr(). Yu-cheng v27: - For shstk_check_rstor_token(), instead of an input param, use current shadow stack pointer. - In response to comments, fix/simplify a few syntax/format issues. arch/x86/include/asm/cet.h | 7 ++ arch/x86/include/asm/special_insns.h | 30 +++++++ arch/x86/kernel/shstk.c | 122 +++++++++++++++++++++++++++ 3 files changed, 159 insertions(+) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 63ee8b45080d..6e8a7a807dcc 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -19,6 +19,9 @@ int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, void shstk_free(struct task_struct *p); int shstk_disable(void); void reset_thread_shstk(void); +int shstk_setup_rstor_token(bool proc32, unsigned long restorer, + unsigned long *new_ssp); +int shstk_check_rstor_token(bool proc32, unsigned long *new_ssp); #else static inline void shstk_setup(void) {} static inline int shstk_alloc_thread_stack(struct task_struct *p, @@ -27,6 +30,10 @@ static inline int shstk_alloc_thread_stack(struct task_struct *p, static inline void shstk_free(struct task_struct *p) {} static inline void shstk_disable(void) {} static inline void reset_thread_shstk(void) {} +static inline int shstk_setup_rstor_token(bool proc32, unsigned long restorer, + unsigned long *new_ssp) { return 0; } +static inline int shstk_check_rstor_token(bool proc32, + unsigned long *new_ssp) { return 0; } #endif /* CONFIG_X86_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 68c257a3de0d..f45f378ca1fc 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -222,6 +222,36 @@ static inline void clwb(volatile void *__p) : [pax] "a" (p)); } +#ifdef CONFIG_X86_SHADOW_STACK +static inline int write_user_shstk_32(u32 __user *addr, u32 val) +{ + if (WARN_ONCE(!IS_ENABLED(CONFIG_IA32_EMULATION) && + !IS_ENABLED(CONFIG_X86_X32), + "%s used but not supported.\n", __func__)) { + return -EFAULT; + } + + asm_volatile_goto("1: wrussd %[val], (%[addr])\n" + _ASM_EXTABLE(1b, %l[fail]) + :: [addr] "r" (addr), [val] "r" (val) + :: fail); + return 0; +fail: + return -EFAULT; +} + +static inline int write_user_shstk_64(u64 __user *addr, u64 val) +{ + asm_volatile_goto("1: wrussq %[val], (%[addr])\n" + _ASM_EXTABLE(1b, %l[fail]) + :: [addr] "r" (addr), [val] "r" (val) + :: fail); + return 0; +fail: + return -EFAULT; +} +#endif /* CONFIG_X86_SHADOW_STACK */ + #define nop() asm volatile ("nop") static inline void serialize(void) diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 358f24e806cc..e0caab50ca77 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -23,6 +23,33 @@ #include #include +/* + * Create a restore token on the shadow stack. A token is always 8-byte + * and aligned to 8. + */ +static int create_rstor_token(bool proc32, unsigned long ssp, + unsigned long *token_addr) +{ + unsigned long addr; + + /* Aligned to 8 is aligned to 4, so test 8 first */ + if ((!proc32 && !IS_ALIGNED(ssp, 8)) || !IS_ALIGNED(ssp, 4)) + return -EINVAL; + + addr = ALIGN_DOWN(ssp, 8) - 8; + + /* Is the token for 64-bit? */ + if (!proc32) + ssp |= BIT(0); + + if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) + return -EFAULT; + + *token_addr = addr; + + return 0; +} + static unsigned long alloc_shstk(unsigned long size) { int flags = MAP_ANONYMOUS | MAP_PRIVATE; @@ -213,3 +240,98 @@ int shstk_disable(void) shstk_free(current); return 0; } + +static unsigned long get_user_shstk_addr(void) +{ + void *xstate; + unsigned long long ssp; + + xstate = start_update_xsave_msrs(XFEATURE_CET_USER); + + xsave_rdmsrl(xstate, MSR_IA32_PL3_SSP, &ssp); + + end_update_xsave_msrs(); + + return ssp; +} + +/* + * Create a restore token on shadow stack, and then push the user-mode + * function return address. + */ +int shstk_setup_rstor_token(bool proc32, unsigned long ret_addr, + unsigned long *new_ssp) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + unsigned long ssp, token_addr; + int err; + + if (!shstk->size) + return 0; + + if (!ret_addr) + return -EINVAL; + + ssp = get_user_shstk_addr(); + if (!ssp) + return -EINVAL; + + err = create_rstor_token(proc32, ssp, &token_addr); + if (err) + return err; + + if (proc32) { + ssp = token_addr - sizeof(u32); + err = write_user_shstk_32((u32 __user *)ssp, (u32)ret_addr); + } else { + ssp = token_addr - sizeof(u64); + err = write_user_shstk_64((u64 __user *)ssp, (u64)ret_addr); + } + + if (!err) + *new_ssp = ssp; + + return err; +} + +/* + * Verify the user shadow stack has a valid token on it, and then set + * *new_ssp according to the token. + */ +int shstk_check_rstor_token(bool proc32, unsigned long *new_ssp) +{ + unsigned long token_addr; + unsigned long token; + bool shstk32; + + token_addr = get_user_shstk_addr(); + if (!token_addr) + return -EINVAL; + + if (get_user(token, (unsigned long __user *)token_addr)) + return -EFAULT; + + /* Is mode flag correct? */ + shstk32 = !(token & BIT(0)); + if (proc32 ^ shstk32) + return -EINVAL; + + /* Is busy flag set? */ + if (token & BIT(1)) + return -EINVAL; + + /* Mask out flags */ + token &= ~3UL; + + /* Restore address aligned? */ + if ((!proc32 && !IS_ALIGNED(token, 8)) || !IS_ALIGNED(token, 4)) + return -EINVAL; + + /* Token placed properly? */ + if (((ALIGN_DOWN(token, 8) - 8) != token_addr) || token >= TASK_SIZE_MAX) + return -EINVAL; + + *new_ssp = token; + + return 0; +} From patchwork Sun Jan 30 21:18:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730160 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC7F7C433EF for ; Sun, 30 Jan 2022 21:22:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43DA96B00BB; Sun, 30 Jan 2022 16:22:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A0276B00BD; Sun, 30 Jan 2022 16:22:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 242EE6B00BE; Sun, 30 Jan 2022 16:22:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87]) by kanga.kvack.org (Postfix) with ESMTP id 0ED5F6B00BB for ; Sun, 30 Jan 2022 16:22:13 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id CCF7F181DF763 for ; Sun, 30 Jan 2022 21:22:12 +0000 (UTC) X-FDA: 79088226504.14.0357362 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf09.hostedemail.com (Postfix) with ESMTP id 3152B140002 for ; Sun, 30 Jan 2022 21:22:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577732; x=1675113732; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=d0Xgzna4gIh3ZuI3ZOXqKNlQFX4CnUeeaybj1VMMDGA=; b=AwTZWnWTWS2R+wW63KZx+famkzXuRAwHRCZy98/sCf3vk3Ro+IH/IBdi z3C+keyUBgjJZipmVrQLbYMS8bzkELirO0NRwbOJx3C9ahgFDalX4F3Sz qiP+oBebm4oLKMX/wT4WKQGPvuRRDrF0NjvCnbRnNj3hwMm2U41GsS3JQ Pf0BXF83KRJD/yvwyAl3x8HC5kXAsfhJVIZeQlJMldP5KnUs8z9tdYs7E z8gQKXRleV0jPz+ulFrQElUgKj1MGAYs1uKrwBlo3uFAgZmbxLVPxCH0T Pj8GRE8cfZ/8HsBESiuLdzOLd4uIeqKscDI2r5ZTp6+VrdvSlhD0v7fQH A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970230" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970230" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:10 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856972" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:09 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 30/35] x86/cet/shstk: Handle signals for shadow stack Date: Sun, 30 Jan 2022 13:18:33 -0800 Message-Id: <20220130211838.8382-31-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=AwTZWnWT; spf=none (imf09.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: nil X-Rspamd-Queue-Id: 3152B140002 X-Stat-Signature: n6dr1cozwnfzth8fneyuxsbb8euzutur X-Rspamd-Server: rspam12 X-HE-Tag: 1643577731-788932 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu When a signal is handled normally the context is pushed to the stack before handling it. For shadow stacks, since the shadow stack only track's return addresses, there isn't any state that needs to be pushed. However, there are still a few things that need to be done. These things are userspace visible and which will be kernel ABI for shadow stacks. One is to make sure the restorer address is written to shadow stack, since the signal handler (if not changing ucontext) returns to the restorer, and the restorer calls sigreturn. So add the restorer on the shadow stack before handling the signal, so there is not a conflict when the signal handler returns to the restorer. The other thing to do is to place a restore token on the thread's shadow stack before handling the signal and check it during sigreturn. This is an extra layer of protection to hamper attackers calling sigreturn manually as in SROP-like attacks. So, when handling a signal push - a shadow stack restore token pointing to the current shadow stack address - the restorer address below the restore token. In sigreturn, verify the restore token and pop the shadow stack. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe Cc: Andy Lutomirski Cc: Cyrill Gorcunov Cc: Florian Weimer Cc: H. Peter Anvin Cc: Kees Cook --- v1: - Use xsave helpers. - Expand commit log. Yu-cheng v27: - Eliminate saving shadow stack pointer to signal context. Yu-cheng v25: - Update commit log/comments for the sc_ext struct. - Use restorer address already calculated. - Change CONFIG_X86_CET to CONFIG_X86_SHADOW_STACK. - Change X86_FEATURE_CET to X86_FEATURE_SHSTK. - Eliminate writing to MSR_IA32_U_CET for shadow stack. - Change wrmsrl() to wrmsrl_safe() and handle error. arch/x86/ia32/ia32_signal.c | 25 ++++++++++++++++----- arch/x86/include/asm/cet.h | 4 ++++ arch/x86/kernel/shstk.c | 44 +++++++++++++++++++++++++++++++++++++ arch/x86/kernel/signal.c | 13 +++++++++++ 4 files changed, 81 insertions(+), 5 deletions(-) diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c index c9c3859322fa..a8d038409d60 100644 --- a/arch/x86/ia32/ia32_signal.c +++ b/arch/x86/ia32/ia32_signal.c @@ -34,6 +34,7 @@ #include #include #include +#include static inline void reload_segments(struct sigcontext_32 *sc) { @@ -112,6 +113,10 @@ COMPAT_SYSCALL_DEFINE0(sigreturn) if (!ia32_restore_sigcontext(regs, &frame->sc)) goto badframe; + + if (restore_signal_shadow_stack()) + goto badframe; + return regs->ax; badframe: @@ -137,6 +142,9 @@ COMPAT_SYSCALL_DEFINE0(rt_sigreturn) if (!ia32_restore_sigcontext(regs, &frame->uc.uc_mcontext)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (compat_restore_altstack(&frame->uc.uc_stack)) goto badframe; @@ -261,6 +269,9 @@ int ia32_setup_frame(int sig, struct ksignal *ksig, restorer = &frame->retcode; } + if (setup_signal_shadow_stack(1, restorer)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -318,6 +329,15 @@ int ia32_setup_rt_frame(int sig, struct ksignal *ksig, frame = get_sigframe(ksig, regs, sizeof(*frame), &fp); + if (ksig->ka.sa.sa_flags & SA_RESTORER) + restorer = ksig->ka.sa.sa_restorer; + else + restorer = current->mm->context.vdso + + vdso_image_32.sym___kernel_rt_sigreturn; + + if (setup_signal_shadow_stack(1, restorer)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -333,11 +353,6 @@ int ia32_setup_rt_frame(int sig, struct ksignal *ksig, unsafe_put_user(0, &frame->uc.uc_link, Efault); unsafe_compat_save_altstack(&frame->uc.uc_stack, regs->sp, Efault); - if (ksig->ka.sa.sa_flags & SA_RESTORER) - restorer = ksig->ka.sa.sa_restorer; - else - restorer = current->mm->context.vdso + - vdso_image_32.sym___kernel_rt_sigreturn; unsafe_put_user(ptr_to_compat(restorer), &frame->pretcode, Efault); /* diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 6e8a7a807dcc..faff8dc86159 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -22,6 +22,8 @@ void reset_thread_shstk(void); int shstk_setup_rstor_token(bool proc32, unsigned long restorer, unsigned long *new_ssp); int shstk_check_rstor_token(bool proc32, unsigned long *new_ssp); +int setup_signal_shadow_stack(int proc32, void __user *restorer); +int restore_signal_shadow_stack(void); #else static inline void shstk_setup(void) {} static inline int shstk_alloc_thread_stack(struct task_struct *p, @@ -34,6 +36,8 @@ static inline int shstk_setup_rstor_token(bool proc32, unsigned long restorer, unsigned long *new_ssp) { return 0; } static inline int shstk_check_rstor_token(bool proc32, unsigned long *new_ssp) { return 0; } +static inline int setup_signal_shadow_stack(int proc32, void __user *restorer) { return 0; } +static inline int restore_signal_shadow_stack(void) { return 0; } #endif /* CONFIG_X86_SHADOW_STACK */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index e0caab50ca77..682d85a63a1d 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -335,3 +335,47 @@ int shstk_check_rstor_token(bool proc32, unsigned long *new_ssp) return 0; } + +int setup_signal_shadow_stack(int proc32, void __user *restorer) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + unsigned long new_ssp; + void *xstate; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || !shstk->size) + return 0; + + err = shstk_setup_rstor_token(proc32, (unsigned long)restorer, + &new_ssp); + if (err) + return err; + + xstate = start_update_xsave_msrs(XFEATURE_CET_USER); + err = xsave_wrmsrl(xstate, MSR_IA32_PL3_SSP, new_ssp); + end_update_xsave_msrs(); + + return err; +} + +int restore_signal_shadow_stack(void) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + void *xstate; + int proc32 = in_ia32_syscall(); + unsigned long new_ssp; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK) || !shstk->size) + return 0; + + err = shstk_check_rstor_token(proc32, &new_ssp); + if (err) + return err; + + xstate = start_update_xsave_msrs(XFEATURE_CET_USER); + err = xsave_wrmsrl(xstate, MSR_IA32_PL3_SSP, new_ssp); + end_update_xsave_msrs(); + + return err; +} diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index ec71e06ae364..e6202fc2a56c 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -48,6 +48,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 /* @@ -471,6 +472,9 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig, frame = get_sigframe(&ksig->ka, regs, sizeof(struct rt_sigframe), &fp); uc_flags = frame_uc_flags(regs); + if (setup_signal_shadow_stack(0, ksig->ka.sa.sa_restorer)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -576,6 +580,9 @@ static int x32_setup_rt_frame(struct ksignal *ksig, uc_flags = frame_uc_flags(regs); + if (setup_signal_shadow_stack(0, ksig->ka.sa.sa_restorer)) + return -EFAULT; + if (!user_access_begin(frame, sizeof(*frame))) return -EFAULT; @@ -674,6 +681,9 @@ SYSCALL_DEFINE0(rt_sigreturn) if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (restore_altstack(&frame->uc.uc_stack)) goto badframe; @@ -991,6 +1001,9 @@ COMPAT_SYSCALL_DEFINE0(x32_rt_sigreturn) if (!restore_sigcontext(regs, &frame->uc.uc_mcontext, uc_flags)) goto badframe; + if (restore_signal_shadow_stack()) + goto badframe; + if (compat_restore_altstack(&frame->uc.uc_stack)) goto badframe; From patchwork Sun Jan 30 21:18:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730161 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6900AC433FE for ; Sun, 30 Jan 2022 21:22:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 269706B00BD; Sun, 30 Jan 2022 16:22:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CDE76B00BF; Sun, 30 Jan 2022 16:22:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E259E6B00C1; Sun, 30 Jan 2022 16:22:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0117.hostedemail.com [216.40.44.117]) by kanga.kvack.org (Postfix) with ESMTP id CA1A96B00BD for ; Sun, 30 Jan 2022 16:22:13 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 93F17181E7875 for ; Sun, 30 Jan 2022 21:22:13 +0000 (UTC) X-FDA: 79088226546.11.46FD61A Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf08.hostedemail.com (Postfix) with ESMTP id E883A160003 for ; Sun, 30 Jan 2022 21:22:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577733; x=1675113733; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Em5mNWM4OERuM6Gg1HCpB/weTfMZ74vCRnQkfPKE0BY=; b=kOl5Lqr2i2DyEK0vqGOyRvfjJUGvacP6GGNtuHw+zboAaVAURSa7CQHh aaMyhIFrlFbkmmxl4RaU93+5PFV8+84UU2HN0FDTEDap6nYzrg61qijeo QKRDqj1kNuVF1KQz+kBBQyGyW/O59eNj9YjMNcafL/D/6XXN3gOLt35v0 YZnFvlryHh8Q5rQfewlsCzdqJoY0/WrYOx+7tCz/i502a+MHBplkNcpeF OiWvidcjE3fHp9Jrswg6L7QWoQwQ8WdliVF1EvJfHEiVbjSw0AqUAb+I9 9O9gqIgZawPh72cFP4JdaGzRZIG+00+AKRon0PU03YP3xKoHTE61z54fZ w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970232" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970232" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:11 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856983" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:10 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH 31/35] x86/cet/shstk: Add arch_prctl elf feature functions Date: Sun, 30 Jan 2022 13:18:34 -0800 Message-Id: <20220130211838.8382-32-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Queue-Id: E883A160003 X-Rspam-User: nil Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=kOl5Lqr2; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: 9xq6bo4qgrubsxdh8u6mz7ktbr67pwsp X-Rspamd-Server: rspam08 X-HE-Tag: 1643577732-655954 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yu-cheng Yu Some CPU features that adjust the behavior of existing instructions should be enabled only if the application supports these modifications. Provide a per-thread arch_prctl interface for modifying, checking and locking the enablement status of features like these. This interface operates on the per-thread state which is copied for new threads. It is intended to be mostly used early in an application (i.e. a dynamic loader) such that the behavior will be inherited for new threads created by the application. Today the only user is shadow stack, but keep the names generic because other features like LAM can use it as well. The interface is as below: arch_prctl(ARCH_X86_FEATURE_STATUS, u64 *args) Get feature status. The parameter 'args' is a pointer to a user buffer. The kernel returns the following information: *args = shadow stack/IBT status *(args + 1) = shadow stack base address *(args + 2) = shadow stack size 32-bit binaries use the same interface, but only lower 32-bits of each item. arch_prctl(ARCH_X86_FEATURE_DISABLE, unsigned int features) Disable features specified in 'features'. Return -EPERM if any of the passed feature are locked. Return -ECANCELED if any of the features failed to disable. In this case call ARCH_X86_FEATURE_STATUS to find out which features are still enabled. arch_prctl(ARCH_X86_FEATURE_ENABLE, unsigned int features) Enable feature specified in 'features'. Return -EPERM if any of the passed feature are locked. Return -ECANCELED if any of the features failed to enable. In this case call ARCH_X86_FEATURE_STATUS to find out which features were enabled. arch_prctl(ARCH_X86_FEATURE_LOCK, unsigned int features) Lock in all features at their current enabled or disabled status. Signed-off-by: Yu-cheng Yu Signed-off-by: Rick Edgecombe --- v1: - Changed from ENOSYS and ENOTSUPP error codes per checkpatch. - Changed interface/filename to be more generic so it can be shared with LAM. - Add lock mask, such that some features can be locked, while leaving others to be enabled later. - Add ARCH_X86_FEATURE_ENABLE to use instead of parsing the elf header - Change ARCH_X86_FEATURE_DISABLE to actually return an error on failure. arch/x86/include/asm/cet.h | 6 +++ arch/x86/include/asm/processor.h | 1 + arch/x86/include/uapi/asm/prctl.h | 10 +++++ arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/elf_feature_prctl.c | 66 +++++++++++++++++++++++++++++ arch/x86/kernel/process.c | 2 +- arch/x86/kernel/shstk.c | 1 + 7 files changed, 86 insertions(+), 2 deletions(-) create mode 100644 arch/x86/kernel/elf_feature_prctl.c diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index faff8dc86159..cbc7cfcba5dc 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -40,6 +40,12 @@ static inline int setup_signal_shadow_stack(int proc32, void __user *restorer) { static inline int restore_signal_shadow_stack(void) { return 0; } #endif /* CONFIG_X86_SHADOW_STACK */ +#ifdef CONFIG_X86_SHADOW_STACK +int prctl_elf_feature(int option, u64 arg2); +#else +static inline int prctl_elf_feature(int option, u64 arg2) { return -EINVAL; } +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_CET_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a9f4e9c4ca81..100af0f570c9 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -531,6 +531,7 @@ struct thread_struct { #ifdef CONFIG_X86_SHADOW_STACK struct thread_shstk shstk; + u64 feat_prctl_locked; #endif /* Floating point and extended processor state */ diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 500b96e71f18..aa294c7bcf41 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -20,4 +20,14 @@ #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003 +#define ARCH_X86_FEATURE_STATUS 0x3001 +#define ARCH_X86_FEATURE_DISABLE 0x3002 +#define ARCH_X86_FEATURE_LOCK 0x3003 +#define ARCH_X86_FEATURE_ENABLE 0x3004 + +/* x86 feature bits to be used with ARCH_X86_FEATURE arch_prctl()s */ +#define LINUX_X86_FEATURE_IBT 0x00000001 +#define LINUX_X86_FEATURE_SHSTK 0x00000002 + + #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index d60ae6c365c7..531dba96d4dc 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -153,7 +153,7 @@ obj-$(CONFIG_AMD_MEM_ENCRYPT) += sev.o obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += cc_platform.o -obj-$(CONFIG_X86_SHADOW_STACK) += shstk.o +obj-$(CONFIG_X86_SHADOW_STACK) += shstk.o elf_feature_prctl.o ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/elf_feature_prctl.c b/arch/x86/kernel/elf_feature_prctl.c new file mode 100644 index 000000000000..47de201db3f7 --- /dev/null +++ b/arch/x86/kernel/elf_feature_prctl.c @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* See Documentation/x86/intel_cet.rst. */ + +static int elf_feat_copy_status_to_user(struct thread_shstk *shstk, u64 __user *ubuf) +{ + u64 buf[3] = {}; + + if (shstk->size) { + buf[0] = LINUX_X86_FEATURE_SHSTK; + buf[1] = shstk->base; + buf[2] = shstk->size; + } + + return copy_to_user(ubuf, buf, sizeof(buf)); +} + +int prctl_elf_feature(int option, u64 arg2) +{ + struct thread_struct *thread = ¤t->thread; + u64 feat_succ = 0; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return -EOPNOTSUPP; + + switch (option) { + case ARCH_X86_FEATURE_STATUS: + return elf_feat_copy_status_to_user(&thread->shstk, (u64 __user *)arg2); + case ARCH_X86_FEATURE_DISABLE: + if (arg2 & thread->feat_prctl_locked) + return -EPERM; + + if (arg2 & LINUX_X86_FEATURE_SHSTK && !shstk_disable()) + feat_succ |= LINUX_X86_FEATURE_SHSTK; + + if (feat_succ != arg2) + return -ECANCELED; + return 0; + case ARCH_X86_FEATURE_ENABLE: + if (arg2 & thread->feat_prctl_locked) + return -EPERM; + + if (arg2 & LINUX_X86_FEATURE_SHSTK && !shstk_setup()) + feat_succ |= LINUX_X86_FEATURE_SHSTK; + + if (feat_succ != arg2) + return -ECANCELED; + return 0; + case ARCH_X86_FEATURE_LOCK: + thread->feat_prctl_locked |= arg2; + return 0; + + default: + return -EINVAL; + } +} diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 0fbcf33255fa..11bf09b60f9d 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -1005,5 +1005,5 @@ long do_arch_prctl_common(struct task_struct *task, int option, return fpu_xstate_prctl(task, option, arg2); } - return -EINVAL; + return prctl_elf_feature(option, arg2); } diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 682d85a63a1d..f330be17e2d1 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -130,6 +130,7 @@ int shstk_setup(void) void reset_thread_shstk(void) { + current->thread.feat_prctl_locked = 0; memset(¤t->thread.shstk, 0, sizeof(struct thread_shstk)); } From patchwork Sun Jan 30 21:18:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730162 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED96EC433EF for ; Sun, 30 Jan 2022 21:22:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 553C26B00BF; Sun, 30 Jan 2022 16:22:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A2C66B00C2; Sun, 30 Jan 2022 16:22:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC3406B00C0; Sun, 30 Jan 2022 16:22:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0030.hostedemail.com [216.40.44.30]) by kanga.kvack.org (Postfix) with ESMTP id CDD096B00BF for ; Sun, 30 Jan 2022 16:22:13 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 99ED796765 for ; Sun, 30 Jan 2022 21:22:13 +0000 (UTC) X-FDA: 79088226546.12.E2E0463 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf09.hostedemail.com (Postfix) with ESMTP id 09CAB140002 for ; Sun, 30 Jan 2022 21:22:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577733; x=1675113733; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=bevQ6N+BuwOL/p8dqWUyPQqjTmNufH5CT0XA5AXO3tQ=; b=oKbT4u7wsbNz0veWQOb2XC+4zdi3VH9I/w1/qNqvJYDCb5cut5Osl21S Mog2M6lHDyH7JdwQI4QjSrsbEFJakwZIWIkWvUcIvUzOq8upRdKzwxuP1 ULTb7rEkDJedYfE9TimqebGLsg+XWWGmmIXNGKYoByAA0Ti/4dClYuD+h Qu34b83nl+EMCuyLVZz8XgWYhGVQYDT18sPXYSIg5VmRnCpxHX+4PCeJx arkYQ473WsUfxWA7NjJXDhrea6xoOHyxjBWjgLOWYyYeBKGl/ZcBHFDT6 Tg/vVrAiOiyhvNa7/61SbFmWx6lqhJWMQ68tDUjVAmZ8M9mpRspQM5Jyy g==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970234" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970234" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:12 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856995" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:11 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH 32/35] x86/cet/shstk: Introduce map_shadow_stack syscall Date: Sun, 30 Jan 2022 13:18:35 -0800 Message-Id: <20220130211838.8382-33-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=oKbT4u7w; spf=none (imf09.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: nil X-Rspamd-Queue-Id: 09CAB140002 X-Stat-Signature: 9ok9wzi7ik51fwpif3k5ffkbfn5xxezz X-Rspamd-Server: rspam12 X-HE-Tag: 1643577732-349807 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When operating with shadow stacks enabled, the kernel will automatically allocate shadow stacks for new threads, however in some cases userspace will need additional shadow stacks. The main example of this is the ucontext family of functions, which require userspace allocating and pivoting to userspace managed stacks. Unlike most other user memory permissions, shadow stacks need to be provisioned with special data in order to be useful. They need to be setup with a restore token so that userspace can pivot to them via the RSTORSSP instruction. But, the security design of shadow stack's is that they should not be written to except in limited circumstances. This presents a problem for userspace, as to how userspace can provision this special data, without allowing for the shadow stack to be generally writable. Previously, a new PROT_SHADOW_STACK was attempted, which could be mprotect()ed from RW permissions after the data was provisioned. This was found to not be secure enough, as other thread's could write to the shadow stack during the writable window. The kernel can use a special instruction, WRUSS, to write directly to userspace shadow stacks. So the solution can be that memory can be mapped as shadow stack permissions from the beginning (never generally writable in userspace), and the kernel itself can write the restore token. First, a new madvise() flag was explored, which could operate on the PROT_SHADOW_STACK memory. This had a couple downsides: 1. Extra checks were needed in mprotect() to prevent writable memory from ever becoming PROT_SHADOW_STACK. 2. Extra checks/vma state were needed in the new madvise() to prevent restore tokens being written into the middle of pre-used shadow stacks. It is ideal to prevent restore tokens being added at arbitrary locations, so the check was to make sure the shadow stack had never been written to. 3. It stood out from the rest of the madvise flags, as more of direct action than a hint at future desired behavior. So rather than repurpose two existing syscalls (mmap, madvise) that don't quite fit, just implement a new map_shadow_stack syscall to allow userspace to map and setup new shadow stacks in one step. While ucontext is the primary motivator, userspace may have other unforeseen reasons to setup it's own shadow stacks using the WRSS instruction. Towards this provide a flag so that stacks can be optionally setup securely for the common case of ucontext without enabling WRSS. Or potentially have the kernel set up the shadow stack in some new way. The following example demonstrates how to create a new shadow stack with map_shadow_stack: void *shadow_stack = map_shadow_stack(stack_size, SHADOW_STACK_SET_TOKEN); Signed-off-by: Rick Edgecombe --- v1: - New patch (replaces PROT_SHADOW_STACK). arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/x86/include/uapi/asm/mman.h | 2 ++ arch/x86/kernel/shstk.c | 39 +++++++++++++++++++++++--- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 2 +- kernel/sys_ni.c | 1 + 7 files changed, 42 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 320480a8db4f..68106c12937f 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -455,3 +455,4 @@ 448 i386 process_mrelease sys_process_mrelease 449 i386 futex_waitv sys_futex_waitv 450 i386 set_mempolicy_home_node sys_set_mempolicy_home_node +451 i386 map_shadow_stack sys_map_shadow_stack diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c84d12608cd2..d9639e3e0a33 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -372,6 +372,7 @@ 448 common process_mrelease sys_process_mrelease 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node +451 common map_shadow_stack sys_map_shadow_stack # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h index 9704e27c4d24..dd4e8405e189 100644 --- a/arch/x86/include/uapi/asm/mman.h +++ b/arch/x86/include/uapi/asm/mman.h @@ -26,6 +26,8 @@ ((key) & 0x8 ? VM_PKEY_BIT3 : 0)) #endif +#define SHADOW_STACK_SET_TOKEN 0x1 /* Set up a restore token in the shadow stack */ + #include #endif /* _UAPI_ASM_X86_MMAN_H */ diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index f330be17e2d1..53be5d5539d4 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -45,12 +46,14 @@ static int create_rstor_token(bool proc32, unsigned long ssp, if (write_user_shstk_64((u64 __user *)addr, (u64)ssp)) return -EFAULT; - *token_addr = addr; + if (token_addr) + *token_addr = addr; return 0; } -static unsigned long alloc_shstk(unsigned long size) +static unsigned long alloc_shstk(unsigned long size, unsigned long token_offset, + bool set_res_tok) { int flags = MAP_ANONYMOUS | MAP_PRIVATE; struct mm_struct *mm = current->mm; @@ -61,6 +64,15 @@ static unsigned long alloc_shstk(unsigned long size) &unused, NULL); mmap_write_unlock(mm); + if (!set_res_tok || IS_ERR_VALUE(addr)) + goto out; + + if (create_rstor_token(in_ia32_syscall(), addr + token_offset, NULL)) { + vm_munmap(addr, size); + return -EINVAL; + } + +out: return addr; } @@ -103,7 +115,7 @@ int shstk_setup(void) return 1; size = PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G)); - addr = alloc_shstk(size); + addr = alloc_shstk(size, size, false); if (IS_ERR_VALUE(addr)) return 1; @@ -181,7 +193,7 @@ int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, return -EINVAL; stack_size = PAGE_ALIGN(stack_size); - addr = alloc_shstk(stack_size); + addr = alloc_shstk(stack_size, stack_size, false); if (IS_ERR_VALUE(addr)) { shstk->base = 0; shstk->size = 0; @@ -380,3 +392,22 @@ int restore_signal_shadow_stack(void) return err; } + +SYSCALL_DEFINE2(map_shadow_stack, unsigned long, size, unsigned int, flags) +{ + unsigned long aligned_size; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return -ENOSYS; + + /* + * An overflow would result in attempting to write the restore token + * to the wrong location. Not catastrophic, but just return the right + * error code and block it. + */ + aligned_size = PAGE_ALIGN(size); + if (aligned_size < size) + return -EOVERFLOW; + + return alloc_shstk(aligned_size, size, flags & SHADOW_STACK_SET_TOKEN); +} diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 819c0cb00b6d..11220c40b26a 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1060,6 +1060,7 @@ asmlinkage long sys_memfd_secret(unsigned int flags); asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long len, unsigned long home_node, unsigned long flags); +asmlinkage long sys_map_shadow_stack(unsigned long size, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1c48b0ae3ba3..41112fdd3b66 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -887,7 +887,7 @@ __SYSCALL(__NR_futex_waitv, sys_futex_waitv) __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #undef __NR_syscalls -#define __NR_syscalls 451 +#define __NR_syscalls 452 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index a492f159624f..16a6e1a57c2b 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -380,6 +380,7 @@ COND_SYSCALL(vm86old); COND_SYSCALL(modify_ldt); COND_SYSCALL(vm86); COND_SYSCALL(kexec_file_load); +COND_SYSCALL(map_shadow_stack); /* s390 */ COND_SYSCALL(s390_pci_mmio_read); From patchwork Sun Jan 30 21:18:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730163 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83F84C433F5 for ; Sun, 30 Jan 2022 21:22:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2F136B00C2; Sun, 30 Jan 2022 16:22:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BDCD96B00C3; Sun, 30 Jan 2022 16:22:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0B146B00C4; Sun, 30 Jan 2022 16:22:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id 8E3A66B00C2 for ; Sun, 30 Jan 2022 16:22:14 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4A363181E7875 for ; Sun, 30 Jan 2022 21:22:14 +0000 (UTC) X-FDA: 79088226588.09.2983769 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf08.hostedemail.com (Postfix) with ESMTP id B1E0E160004 for ; Sun, 30 Jan 2022 21:22:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577733; x=1675113733; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=G4fVgUiNuZ57ki/aV9t4t6lONJ9wFjRKKu+IxwPcJt0=; b=AH5Xkx4kinVdVgP887QGXU3JehV+c3k9QvTfZohb/vUtGfKnHrZhjijE zkOT3vTXmBuwSbB4uwyjj1CiYVb4MzaV5hJrcyYf/tOl7kxmcpGLwveRm MnI7XrlDIoZ74E4x7My/u2jYM5uvBsNUvnmHsEQwaUKES1bq99H/1toi+ CBkwzK0xRMfxE3nsjAXEKF6+ZngRYlzfIrEBgTNQkUlC3EwlBiRJSRo8X e8nW1TBHQMflA0geAcFfRbdS+GACBC7Nl/uDh7dPOkV0A7DzPBGo9QjJS YUPsi46saDzucYCfmqwViLUz30Es35OpjT8LlxpMV657Zpex8Ki291fd7 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970236" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970236" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:13 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536857008" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:12 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com, Yu@kvack.org, Yu-cheng Subject: [PATCH 33/35] selftests/x86: Add map_shadow_stack syscall test Date: Sun, 30 Jan 2022 13:18:36 -0800 Message-Id: <20220130211838.8382-34-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Queue-Id: B1E0E160004 X-Rspam-User: nil Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=AH5Xkx4k; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf08.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Stat-Signature: jz8e7q3z1k56ymy3ndzajaicfwys4ace X-Rspamd-Server: rspam08 X-HE-Tag: 1643577733-253413 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a simple selftest for exercising the new map_shadow_stack syscall. Co-developed-by: Yu, Yu-cheng Signed-off-by: Yu, Yu-cheng Signed-off-by: Rick Edgecombe --- v1: - New patch. tools/testing/selftests/x86/Makefile | 9 ++- .../selftests/x86/test_map_shadow_stack.c | 75 +++++++++++++++++++ 2 files changed, 83 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/x86/test_map_shadow_stack.c diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 8a1f62ab3c8e..9114943336f9 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -9,11 +9,13 @@ UNAME_M := $(shell uname -m) CAN_BUILD_I386 := $(shell ./check_cc.sh $(CC) trivial_32bit_program.c -m32) CAN_BUILD_X86_64 := $(shell ./check_cc.sh $(CC) trivial_64bit_program.c) CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) +CAN_BUILD_WITH_SHSTK := $(shell ./check_cc.sh $(CC) trivial_program.c -mshstk -fcf-protection) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ test_vsyscall mov_ss_trap \ - syscall_arg_fault fsgsbase_restore sigaltstack + syscall_arg_fault fsgsbase_restore sigaltstack \ + test_map_shadow_stack TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer @@ -105,3 +107,8 @@ $(OUTPUT)/test_syscall_vdso_32: thunks_32.S # state. $(OUTPUT)/check_initial_reg_state_32: CFLAGS += -Wl,-ereal_start -static $(OUTPUT)/check_initial_reg_state_64: CFLAGS += -Wl,-ereal_start -static + +ifeq ($(CAN_BUILD_WITH_SHSTK),1) +$(OUTPUT)/test_map_shadow_stack_64: CFLAGS += -mshstk -fcf-protection +$(OUTPUT)/test_map_shadow_stack_32: CFLAGS += -mshstk -fcf-protection +endif \ No newline at end of file diff --git a/tools/testing/selftests/x86/test_map_shadow_stack.c b/tools/testing/selftests/x86/test_map_shadow_stack.c new file mode 100644 index 000000000000..dfd94ef0176d --- /dev/null +++ b/tools/testing/selftests/x86/test_map_shadow_stack.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SS_SIZE 0x200000 + +void *create_shstk(void) +{ + return (void *)syscall(__NR_map_shadow_stack, SS_SIZE, SHADOW_STACK_SET_TOKEN); +} + +#if (__GNUC__ < 8) || (__GNUC__ == 8 && __GNUC_MINOR__ < 5) +int main(int argc, char *argv[]) +{ + printf("SKIP: compiler does not support CET."); + return 0; +} +#else +void try_shstk(unsigned long new_ssp) +{ + unsigned long ssp0, ssp1; + + printf("pid=%d\n", getpid()); + printf("new_ssp = %lx, *new_ssp = %lx\n", + new_ssp, *((unsigned long *)new_ssp)); + + ssp0 = _get_ssp(); + printf("changing ssp from %lx to %lx\n", ssp0, new_ssp); + + /* Make sure is aligned to 8 bytes */ + if ((ssp0 & 0xf) != 0) + ssp0 &= -8; + + asm volatile("rstorssp (%0)\n":: "r" (new_ssp)); + asm volatile("saveprevssp"); + ssp1 = _get_ssp(); + printf("ssp is now %lx\n", ssp1); + + ssp0 -= 8; + asm volatile("rstorssp (%0)\n":: "r" (ssp0)); + asm volatile("saveprevssp"); +} + +int main(int argc, char *argv[]) +{ + void *shstk; + + if (!_get_ssp()) { + printf("SKIP: shadow stack disabled."); + return 0; + } + + shstk = create_shstk(); + if (shstk == MAP_FAILED) { + printf("FAIL: Error creating shadow stack: %d\n", errno); + return 1; + } + try_shstk((unsigned long)shstk + SS_SIZE - 8); + + printf("PASS.\n"); + return 0; +} +#endif From patchwork Sun Jan 30 21:18:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730164 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC407C4332F for ; Sun, 30 Jan 2022 21:22:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 577F26B00C3; Sun, 30 Jan 2022 16:22:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 500176B00C5; Sun, 30 Jan 2022 16:22:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 303C36B00C6; Sun, 30 Jan 2022 16:22:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 1B1DC6B00C3 for ; Sun, 30 Jan 2022 16:22:16 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id CB0248249980 for ; Sun, 30 Jan 2022 21:22:15 +0000 (UTC) X-FDA: 79088226630.27.6B705BB Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf23.hostedemail.com (Postfix) with ESMTP id 430E2140006 for ; Sun, 30 Jan 2022 21:22:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577735; x=1675113735; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=s08st2zzx9YvkrATdX2s27PCtQcWcePp2JHFEXn/D9c=; b=JFJqG2p5J79KG+eEAig45HnS/LzrAUoS8TLTYPdhWwOHUtAsPOkPHlHM R8vRKg+XKKby48Pen6TshIowUt+lyfiR/jdRbXsEN0ZY1CQ2YhDFI+lKT 4YQ5FELkuNGkdsOIXhGrDw9rd1+DDtPanSLWYRGgrXCYy/YvvNuHsHewl teC3U8n7rAZw01hopaPAWwSAExiH3ET8/8OkA1MJKAe56OCtomutFMSK8 WMIV788GtjK5XgYY3BhKXaia/5egcfx1DWN2UdbVq8GtHmQK5vQmBo2T2 vVOqJnDpZU8jY99xFzL+Kmfh/Uje4YiYy8SMcTdttCISGevjvdO5p59kZ w==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970237" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970237" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536857021" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:13 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH 34/35] x86/cet/shstk: Support wrss for userspace Date: Sun, 30 Jan 2022 13:18:37 -0800 Message-Id: <20220130211838.8382-35-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 430E2140006 X-Stat-Signature: 3rsazmeyctoh9dh4az4dndnifydnimwn Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JFJqG2p5; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf23.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspam-User: nil X-HE-Tag: 1643577735-378077 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For the current shadow stack implementation, shadow stacks contents cannot be arbitrarily provisioned with data. This property helps apps protect themselves better, but also restricts any potential apps that may want to do exotic things at the expense of a little security. The x86 shadow stack feature introduces a new instruction, wrss, which can be enabled to write directly to shadow stack permissioned memory from userspace. Allow it to get enabled via the prctl interface. Only enable the userspace wrss instruction, which allows writes to userspace shadow stacks from userspace. Do not allow it to be enabled independently of shadow stack, as HW does not support using WRSS when shadow stack is disabled. Prevent shadow stack's from becoming executable to assist apps who want W^X enforced. Add an arch_validate_flags() implementation to handle the check. Rename the uapi/asm/mman.h header guard to be able to use it for arch/x86/include/asm/mman.h where the arch_validate_flags() will be. From a fault handler perspective, WRSS will behave very similar to WRUSS, which is treated like a user access from a PF err code perspective. Signed-off-by: Rick Edgecombe --- v1: - New patch. arch/x86/include/asm/cet.h | 3 +++ arch/x86/include/asm/mman.h | 5 ++++- arch/x86/include/uapi/asm/prctl.h | 2 +- arch/x86/kernel/elf_feature_prctl.c | 6 +++++ arch/x86/kernel/shstk.c | 35 ++++++++++++++++++++++++++++- 5 files changed, 48 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index cbc7cfcba5dc..c8ff0bd5f5bc 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -10,6 +10,7 @@ struct task_struct; struct thread_shstk { u64 base; u64 size; + bool wrss; }; #ifdef CONFIG_X86_SHADOW_STACK @@ -19,6 +20,7 @@ int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, void shstk_free(struct task_struct *p); int shstk_disable(void); void reset_thread_shstk(void); +int wrss_control(bool enable); int shstk_setup_rstor_token(bool proc32, unsigned long restorer, unsigned long *new_ssp); int shstk_check_rstor_token(bool proc32, unsigned long *new_ssp); @@ -32,6 +34,7 @@ static inline int shstk_alloc_thread_stack(struct task_struct *p, static inline void shstk_free(struct task_struct *p) {} static inline void shstk_disable(void) {} static inline void reset_thread_shstk(void) {} +static inline void wrss_control(bool enable) {} static inline int shstk_setup_rstor_token(bool proc32, unsigned long restorer, unsigned long *new_ssp) { return 0; } static inline int shstk_check_rstor_token(bool proc32, diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h index b44fe31deb3a..c05951a36d93 100644 --- a/arch/x86/include/asm/mman.h +++ b/arch/x86/include/asm/mman.h @@ -8,7 +8,10 @@ #ifdef CONFIG_X86_SHADOW_STACK static inline bool arch_validate_flags(unsigned long vm_flags) { - if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE)) + /* + * Shadow stack must not be executable, to help with W^X due to wrss. + */ + if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & (VM_WRITE | VM_EXEC))) return false; return true; diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index aa294c7bcf41..210976925325 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -28,6 +28,6 @@ /* x86 feature bits to be used with ARCH_X86_FEATURE arch_prctl()s */ #define LINUX_X86_FEATURE_IBT 0x00000001 #define LINUX_X86_FEATURE_SHSTK 0x00000002 - +#define LINUX_X86_FEATURE_WRSS 0x00000010 #endif /* _ASM_X86_PRCTL_H */ diff --git a/arch/x86/kernel/elf_feature_prctl.c b/arch/x86/kernel/elf_feature_prctl.c index 47de201db3f7..ecad6ebeb4dd 100644 --- a/arch/x86/kernel/elf_feature_prctl.c +++ b/arch/x86/kernel/elf_feature_prctl.c @@ -21,6 +21,8 @@ static int elf_feat_copy_status_to_user(struct thread_shstk *shstk, u64 __user * buf[1] = shstk->base; buf[2] = shstk->size; } + if (shstk->wrss) + buf[0] |= LINUX_X86_FEATURE_WRSS; return copy_to_user(ubuf, buf, sizeof(buf)); } @@ -40,6 +42,8 @@ int prctl_elf_feature(int option, u64 arg2) if (arg2 & thread->feat_prctl_locked) return -EPERM; + if (arg2 & LINUX_X86_FEATURE_WRSS && !wrss_control(false)) + feat_succ |= LINUX_X86_FEATURE_WRSS; if (arg2 & LINUX_X86_FEATURE_SHSTK && !shstk_disable()) feat_succ |= LINUX_X86_FEATURE_SHSTK; @@ -52,6 +56,8 @@ int prctl_elf_feature(int option, u64 arg2) if (arg2 & LINUX_X86_FEATURE_SHSTK && !shstk_setup()) feat_succ |= LINUX_X86_FEATURE_SHSTK; + if (arg2 & LINUX_X86_FEATURE_WRSS && !wrss_control(true)) + feat_succ |= LINUX_X86_FEATURE_WRSS; if (feat_succ != arg2) return -ECANCELED; diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index 53be5d5539d4..92612236b4ef 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -230,6 +230,36 @@ void shstk_free(struct task_struct *tsk) shstk->size = 0; } +int wrss_control(bool enable) +{ + struct thread_shstk *shstk = ¤t->thread.shstk; + void *xstate; + int err; + + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) + return 1; + /* + * Only enable wrss if shadow stack is enabled. If shadow stack is not + * enabled, wrss will already be disabled, so don't bother clearing it + * when disabling. + */ + if (!shstk->size || shstk->wrss == enable) + return 1; + + xstate = start_update_xsave_msrs(XFEATURE_CET_USER); + if (enable) + err = xsave_set_clear_bits_msrl(xstate, MSR_IA32_U_CET, CET_WRSS_EN, 0); + else + err = xsave_set_clear_bits_msrl(xstate, MSR_IA32_U_CET, 0, CET_WRSS_EN); + end_update_xsave_msrs(); + + if (err) + return 1; + + shstk->wrss = enable; + return 0; +} + int shstk_disable(void) { struct thread_shstk *shstk = ¤t->thread.shstk; @@ -242,7 +272,9 @@ int shstk_disable(void) return 1; xstate = start_update_xsave_msrs(XFEATURE_CET_USER); - err = xsave_set_clear_bits_msrl(xstate, MSR_IA32_U_CET, 0, CET_SHSTK_EN); + /* Disable WRSS too when disabling shadow stack */ + err = xsave_set_clear_bits_msrl(xstate, MSR_IA32_U_CET, 0, + CET_SHSTK_EN | CET_WRSS_EN); if (!err) err = xsave_wrmsrl(xstate, MSR_IA32_PL3_SSP, 0); end_update_xsave_msrs(); @@ -251,6 +283,7 @@ int shstk_disable(void) return 1; shstk_free(current); + shstk->wrss = 0; return 0; } From patchwork Sun Jan 30 21:18:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 12730165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63479C433F5 for ; Sun, 30 Jan 2022 21:22:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA5B56B00C5; Sun, 30 Jan 2022 16:22:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB5CA6B00C7; Sun, 30 Jan 2022 16:22:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9BA0E6B00C8; Sun, 30 Jan 2022 16:22:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) by kanga.kvack.org (Postfix) with ESMTP id 7EE616B00C7 for ; Sun, 30 Jan 2022 16:22:16 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4B6BE181E7875 for ; Sun, 30 Jan 2022 21:22:16 +0000 (UTC) X-FDA: 79088226672.05.FD9627B Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf25.hostedemail.com (Postfix) with ESMTP id 7B57DA0002 for ; Sun, 30 Jan 2022 21:22:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577735; x=1675113735; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=qjlVyEBxxP1l305ldO+H2m7erFyvzIp4PIJC/dne3Rw=; b=k/yMSA5vOXO/S/FL3cUMuz8JUZPccsY9IFgQ3sxu8syKAVUVDlvdrOFY cSbIwBoYcHzvwMs3WxXxIAElv3q0RtCm0l657mEG+lsB1QBRLNRG/AFKT YrZTaYpzbUBpA9MbbDR4qxH5wiK74UyCZ71kB1q4mUPOvi+B+oiXjrR5V c2eT0JaIGJ5gMIQF4Xz7FzLIVPjNblf5VcDFZhazr/sz7HH51rLLahmyG j7g1yznLXLHF5mPgsg0vLOmmsG4JmPCArrBej33qum5SQ8SP8zuXMiLT6 cU213hnC3W5gDOpNNq2toDdGNP8ijIe2V0Nr217tkCt3qxj+8MXV6a6n3 g==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="244970239" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="244970239" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536857038" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:14 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH 35/35] x86/cpufeatures: Limit shadow stack to Intel CPUs Date: Sun, 30 Jan 2022 13:18:38 -0800 Message-Id: <20220130211838.8382-36-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 7B57DA0002 X-Stat-Signature: bn7sac7zfz9d3yximytc41z7ufrmj7ya X-Rspam-User: nil Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="k/yMSA5v"; spf=none (imf25.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.93) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (policy=none) header.from=intel.com X-HE-Tag: 1643577735-911986 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shadow stack is supported on newer AMD processors, but the kernel implementation has not been tested on them. Prevent basic issues from showing up for normal users by disabling shadow stack on all CPUs except Intel until it has been tested. At which point the limitation should be removed. Signed-off-by: Rick Edgecombe --- v1: - New patch. arch/x86/kernel/cpu/common.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9ee339f5b8ca..7fbfe707a1db 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -517,6 +517,14 @@ __setup("nopku", setup_disable_pku); static __always_inline void setup_cet(struct cpuinfo_x86 *c) { + /* + * Shadow stack is supported on AMD processors, but has not been + * tested. Only support it on Intel processors until this is done. + * At which point, this vendor check should be removed. + */ + if (c->x86_vendor != X86_VENDOR_INTEL) + setup_clear_cpu_cap(X86_FEATURE_SHSTK); + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) return;