From patchwork Wed Jul 18 09:40:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joerg Roedel X-Patchwork-Id: 10531831 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2F72F600F4 for ; Wed, 18 Jul 2018 09:41:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BE061FF0B for ; Wed, 18 Jul 2018 09:41:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0F703207A7; Wed, 18 Jul 2018 09:41:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E8721FF0B for ; Wed, 18 Jul 2018 09:41:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B29866B027C; Wed, 18 Jul 2018 05:41:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A117A6B027A; Wed, 18 Jul 2018 05:41:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 690666B027B; Wed, 18 Jul 2018 05:41:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id F119A6B0274 for ; Wed, 18 Jul 2018 05:41:30 -0400 (EDT) Received: by mail-ed1-f72.google.com with SMTP id b25-v6so1680778eds.17 for ; Wed, 18 Jul 2018 02:41:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=6+NVHtIcnYZ8rBjsApilYjFAQEhkorjl++AuFGjeaoY=; b=eL7aZxRAl3JrONnqkbCBM1g2RoYE4T5sKPtGnnHZ/gSZk9DAOScYU9tbO+EoljaUKk jRHOe51OL8Rn1LrOGcznier/3b2Q+XP5jWIIzLer5hGPgIyLB8xJe7UixoVBvdMaMjGP 7jS7ZP8BRxa6N9L1V/DKs2HlRwf1fUxux8QSY/GiuA3KOl6Gt4N9JMaPUGDvNithf4Lr hkHyK9/H4zeGI99fZJ/aswlOQ0Nb/zZyBbrmC4BKrf1V8Rn6EudEAXY37JpRKRXR62yx wQ/o6pLOl9pZIMu7eMbJg/1ocSxSPWu4C9dyfGGQS74hH+Sc/F54CYj2OOZWcK0uGRu6 wMuQ== X-Gm-Message-State: AOUpUlGPaKAWHqosUmq121/3utohGtAAkLohyQZfGicRlVtLFvKV5yNu 9oAAf9qZSOu/m01gNVyVdCvptkGHyvM65IGEXhbQbOveW2N12Dqdk0UZb+4OPX7zFBPk2qOJjaH yjp9glcGVs0nEmCCF5LJcXWx6wcsfFVOLNXRphUDP2xe6X6HcPUw4Qazoic/XCiNG2A== X-Received: by 2002:a50:a9e2:: with SMTP id n89-v6mr6201862edc.158.1531906890534; Wed, 18 Jul 2018 02:41:30 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfn/NKHm00q3APEd4LjPIRUS/i55lhgkrPeUDMFTuPx2DBOShP/o0omJNFFVMmOjMB++AlS X-Received: by 2002:a50:a9e2:: with SMTP id n89-v6mr6201803edc.158.1531906889583; Wed, 18 Jul 2018 02:41:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531906889; cv=none; d=google.com; s=arc-20160816; b=XOHt1+zCZgXFDpT6A7QeYBQyLzsu9VoZcRMUFuKJzth+s8vzvoSM7HkpOUeYtOKlIM kn2gZclCzACTOl2NEgGnY54Txu+C4Ea8Z3M7mlwsGYLmcvLFwTt1Kquk8T2LMDSgNVbv bFINJJXJgAdNQFXxMK0no2cIXeJuzai9SkI+oRkAEInz36TUgarGdmzgwcy7apkoi3pO 4IbCUHX/hAJs3mKtH0lETxtYicQLMgD+YaNy5rabtdEgwgADCAan5VW8f6IsCNbuKpsk Rl+UtRU3X3ewX2BMYAnRYnN9jGCeQIxS7yFXH3k/5j7UB+apefwroLU6MHXlCLBDLTGM XOBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=6+NVHtIcnYZ8rBjsApilYjFAQEhkorjl++AuFGjeaoY=; b=PuAKw2S2bUJZBT3ZfoyzZ4ETx5gVfZBqWr0noHo1ncSxRXcOhPP60IdP/h1Cbvtx8B 0Q5TGC78xx0uwcpl1jWf9RgB8FEwpOfyg8KjCShzxSEqyDb2GkB5AV2Ee0LrFByAG0po ZQZ3sStJmnjzDTTzF4Tw0eFo2YR3vIQ+KtJF8+9ctDGDbqJBd4pv2sOQOag5yrT/QxBy rl7lzNyfWPhmNtIHLAk40V/sBZ8unVT3orUgynL1eNGEqpbVR7g3PmwOxmVmR6yahzVr F/1iEo61yQjNXhFbs9edDQdkLF01dS8/R4GYnjQpJhM5F4722ZXxEl/3W06jTvkoYGDO umUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=Q0JW7qs+; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Received: from theia.8bytes.org (8bytes.org. [2a01:238:4383:600:38bc:a715:4b6d:a889]) by mx.google.com with ESMTPS id q15-v6si935791edd.134.2018.07.18.02.41.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Jul 2018 02:41:29 -0700 (PDT) Received-SPF: pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) client-ip=2a01:238:4383:600:38bc:a715:4b6d:a889; Authentication-Results: mx.google.com; dkim=pass (test mode) header.i=@8bytes.org header.s=mail-1 header.b=Q0JW7qs+; spf=pass (google.com: domain of joro@8bytes.org designates 2a01:238:4383:600:38bc:a715:4b6d:a889 as permitted sender) smtp.mailfrom=joro@8bytes.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=8bytes.org Received: by theia.8bytes.org (Postfix, from userid 1000) id 8A9F52A9; Wed, 18 Jul 2018 11:41:19 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=8bytes.org; s=mail-1; t=1531906880; bh=dQROBCeZq0CXpVzO4r85JemVG1pWJE5GiUltpV6xkOc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Q0JW7qs+Tma+Xz5cfazeUxjDIyVI9L5nlovKGpdnMUMviC09+KoeY2hmoq3Zy3i3W aLEkcU1QTGnZes1NDygyZCUGvJlOv3hxhka2OaMUbaVKdB+tINZx6Unq0Bsh/SgP8J heth3RIXjGL6yzRnIKuWW8veK0a0xIhagETmZwU7EK5sfzjZl1UGOEFj6yekrRvGyl HV9Y+1jmG2DOU/RsCKK6eHyJZvGkPLX+gv4Bwb+RDSBvnW7L41JBRgmZ91DX4vDz33 D4Zo0cQkXUvpNnNZ8VPuHs6QKBBpGZP/QIyGmMf80yhKYiGYcueLB2HnxnoZK4u8lf ylOY4bMTh9D2w== From: Joerg Roedel To: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andy Lutomirski , Dave Hansen , Josh Poimboeuf , Juergen Gross , Peter Zijlstra , Borislav Petkov , Jiri Kosina , Boris Ostrovsky , Brian Gerst , David Laight , Denys Vlasenko , Eduardo Valentin , Greg KH , Will Deacon , aliguori@amazon.com, daniel.gruss@iaik.tugraz.at, hughd@google.com, keescook@google.com, Andrea Arcangeli , Waiman Long , Pavel Machek , "David H . Gutteridge" , jroedel@suse.de, joro@8bytes.org Subject: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Date: Wed, 18 Jul 2018 11:40:47 +0200 Message-Id: <1531906876-13451-11-git-send-email-joro@8bytes.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531906876-13451-1-git-send-email-joro@8bytes.org> References: <1531906876-13451-1-git-send-email-joro@8bytes.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Joerg Roedel It can happen that we enter the kernel from kernel-mode and on the entry-stack. The most common way this happens is when we get an exception while loading the user-space segment registers on the kernel-to-userspace exit path. The segment loading needs to be done after the entry-stack switch, because the stack-switch needs kernel %fs for per_cpu access. When this happens, we need to make sure that we leave the kernel with the entry-stack again, so that the interrupted code-path runs on the right stack when switching to the user-cr3. We do this by detecting this condition on kernel-entry by checking CS.RPL and %esp, and if it happens, we copy over the complete content of the entry stack to the task-stack. This needs to be done because once we enter the exception handlers we might be scheduled out or even migrated to a different CPU, so that we can't rely on the entry-stack contents. We also leave a marker in the stack-frame to detect this condition on the exit path. On the exit path the copy is reversed, we copy all of the remaining task-stack back to the entry-stack and switch to it. Signed-off-by: Joerg Roedel --- arch/x86/entry/entry_32.S | 116 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 115 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 7635925..9d6eceb 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -294,6 +294,9 @@ * copied there. So allocate the stack-frame on the task-stack and * switch to it before we do any copying. */ + +#define CS_FROM_ENTRY_STACK (1 << 31) + .macro SWITCH_TO_KERNEL_STACK ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV @@ -316,6 +319,16 @@ /* Load top of task-stack into %edi */ movl TSS_entry2task_stack(%edi), %edi + /* + * Clear unused upper bits of the dword containing the word-sized CS + * slot in pt_regs in case hardware didn't clear it for us. + */ + andl $(0x0000ffff), PT_CS(%esp) + + /* Special case - entry from kernel mode via entry stack */ + testl $SEGMENT_RPL_MASK, PT_CS(%esp) + jz .Lentry_from_kernel_\@ + /* Bytes to copy */ movl $PTREGS_SIZE, %ecx @@ -329,8 +342,8 @@ */ addl $(4 * 4), %ecx -.Lcopy_pt_regs_\@: #endif +.Lcopy_pt_regs_\@: /* Allocate frame on task-stack */ subl %ecx, %edi @@ -346,6 +359,56 @@ cld rep movsl + jmp .Lend_\@ + +.Lentry_from_kernel_\@: + + /* + * This handles the case when we enter the kernel from + * kernel-mode and %esp points to the entry-stack. When this + * happens we need to switch to the task-stack to run C code, + * but switch back to the entry-stack again when we approach + * iret and return to the interrupted code-path. This usually + * happens when we hit an exception while restoring user-space + * segment registers on the way back to user-space. + * + * When we switch to the task-stack here, we can't trust the + * contents of the entry-stack anymore, as the exception handler + * might be scheduled out or moved to another CPU. Therefore we + * copy the complete entry-stack to the task-stack and set a + * marker in the iret-frame (bit 31 of the CS dword) to detect + * what we've done on the iret path. + * + * On the iret path we copy everything back and switch to the + * entry-stack, so that the interrupted kernel code-path + * continues on the same stack it was interrupted with. + * + * Be aware that an NMI can happen anytime in this code. + * + * %esi: Entry-Stack pointer (same as %esp) + * %edi: Top of the task stack + */ + + /* Calculate number of bytes on the entry stack in %ecx */ + movl %esi, %ecx + + /* %ecx to the top of entry-stack */ + andl $(MASK_entry_stack), %ecx + addl $(SIZEOF_entry_stack), %ecx + + /* Number of bytes on the entry stack to %ecx */ + sub %esi, %ecx + + /* Mark stackframe as coming from entry stack */ + orl $CS_FROM_ENTRY_STACK, PT_CS(%esp) + + /* + * %esi and %edi are unchanged, %ecx contains the number of + * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate + * the stack-frame on task-stack and copy everything over + */ + jmp .Lcopy_pt_regs_\@ + .Lend_\@: .endm @@ -404,6 +467,56 @@ .endm /* + * This macro handles the case when we return to kernel-mode on the iret + * path and have to switch back to the entry stack. + * + * See the comments below the .Lentry_from_kernel_\@ label in the + * SWITCH_TO_KERNEL_STACK macro for more details. + */ +.macro PARANOID_EXIT_TO_KERNEL_MODE + + /* + * Test if we entered the kernel with the entry-stack. Most + * likely we did not, because this code only runs on the + * return-to-kernel path. + */ + testl $CS_FROM_ENTRY_STACK, PT_CS(%esp) + jz .Lend_\@ + + /* Unlikely slow-path */ + + /* Clear marker from stack-frame */ + andl $(~CS_FROM_ENTRY_STACK), PT_CS(%esp) + + /* Copy the remaining task-stack contents to entry-stack */ + movl %esp, %esi + movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi + + /* Bytes on the task-stack to ecx */ + movl PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %ecx + subl %esi, %ecx + + /* Allocate stack-frame on entry-stack */ + subl %ecx, %edi + + /* + * Save future stack-pointer, we must not switch until the + * copy is done, otherwise the NMI handler could destroy the + * contents of the task-stack we are about to copy. + */ + movl %edi, %ebx + + /* Do the copy */ + shrl $2, %ecx + cld + rep movsl + + /* Safe to switch to entry-stack now */ + movl %ebx, %esp + +.Lend_\@: +.endm +/* * %eax: prev task * %edx: next task */ @@ -764,6 +877,7 @@ restore_all: restore_all_kernel: TRACE_IRQS_IRET + PARANOID_EXIT_TO_KERNEL_MODE RESTORE_REGS 4 jmp .Lirq_return