From patchwork Thu Jan 17 00:32:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Edgecombe, Rick P" X-Patchwork-Id: 10767255 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C8B21390 for ; Thu, 17 Jan 2019 00:34:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8877C2DDDC for ; Thu, 17 Jan 2019 00:34:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7BA572DE41; Thu, 17 Jan 2019 00:34:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B8D2F2DDDC for ; Thu, 17 Jan 2019 00:34:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 177988E000D; Wed, 16 Jan 2019 19:33:41 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DDE448E0010; Wed, 16 Jan 2019 19:33:40 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 926928E0006; Wed, 16 Jan 2019 19:33:40 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 3B6768E0006 for ; Wed, 16 Jan 2019 19:33:40 -0500 (EST) Received: by mail-pg1-f200.google.com with SMTP id o17so5009241pgi.14 for ; Wed, 16 Jan 2019 16:33:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=rZi1tUrX4fiQHdOgLSpxZC/PkiG3Sfgl7iH9TJ92CQs=; b=KhmTAnlRQX+YfTYyD0JhcESxL3iFSyAwFVs11xMH9DlSS05vJ8BHlxseGZuZ7EKrer xLyu+Q3imOKcOI6F3B+n9pP3msynSf/sC/akTlXJ/99kYUlbIz+ffJCeiNsnddIeDhfM cubuGQi60oOlGXazBOLQODZ5JkwkXDOUCxSEkM+wxq3YhFfvvtur3TYYfbu1gqhdBfTo bk0RgkQsFn9eVIH4g6FheH6akFUc7e7P87AtUH+c7Wn+jLhLPuzVkiRjOsZrBrw+g1rw 5NUFGWOCordSS7iXNLbBUiUDht3grZat5pay9j4EqR8GdlBD/ZN5ZbArlH9yU3YdVfHR r8CA== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Gm-Message-State: AJcUukegiXFLgODFZUKb4ui+nG6mIX2Cxzh+wcbonDLgWcrcyQA7SCkr VJvoEM8m4VNoVRwgapD1Aiy3G3hbaI8e7g0VwY03ssSpCoU7T9XHI4xKnE5c2dRwl+RuZ2FjQE+ sMDEAVSMkDiAMnq10CrgGYeRuHN4WNxV07X7XlkqBIyV9CAmKY4F0kfBnlZlayaxMCg== X-Received: by 2002:a17:902:8d94:: with SMTP id v20mr12639480plo.194.1547685219816; Wed, 16 Jan 2019 16:33:39 -0800 (PST) X-Google-Smtp-Source: ALg8bN7dwI4pLqvDe2nARJD1mdmCi+phZtH6J3PXMDiiK9ZTDDJ3vC7fVtyUwX4t0kmbrqv6hA3y X-Received: by 2002:a17:902:8d94:: with SMTP id v20mr12639397plo.194.1547685218803; Wed, 16 Jan 2019 16:33:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547685218; cv=none; d=google.com; s=arc-20160816; b=TYuBTMLechwvIfG8shFi+QKpeOhN3YNxkJkTVE64PrcyH/ngA8zefjE9L4KylXtlUj FbPJaeyhfwTz9sESKc7TvC9hElQueFn5BUCtq47iNsvHQG+jtW2MKX50vl2ybhcFQhgy vz+TNcI4q7vqmKLvDEfvWipZQHm2076N5a4SpBqcAQ+zW/nzQd58pPJoFnfxZWCwOk5B dwN3HjpHLx0B8CoyqMtTU/knpkxwEHu8iq9yYPAktPqQPTwYZRWLvn8JYaTvk7NPFqx5 Bv6ZNDS5cuXmwfAUf7l1W+Yo3h0zqYqbTGyY+AD1YNyqHjrwVd2HQVFTOOXt9nJ8mvjb kWOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=rZi1tUrX4fiQHdOgLSpxZC/PkiG3Sfgl7iH9TJ92CQs=; b=iyAKh0oUyUqztnPaWXEbKzH3Bk73ABMyBBQmSJXlqUq/QDneJc73Uhky4/Y9Csun/B 1bKhHjIeGS8L1Q85LDMVy17HGtopISfLIHmuH5Hdq7FI6rHgUo2zLqF2XiewZyGEtj7f Z+hnndyCh3VrGQyRRGlADcQ5Qv9OZJaKursbwYXU9TQMAeM9T6RutV2sqrhPv59/dS0K FjLJhb8lJYPWRyGxkbvLsoKRHc9vO6UuISPK5qKVvBtS/UeYvAykbzdgTpdFw0CRdWmD 8SzAajgmYqdcXbjm2h7XBaFaaBvVPl2l5lVTji4WaKKteVYnJ6Jt81Lzp2rNUN+kz8Ui suEA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id q20si7678846pll.255.2019.01.16.16.33.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 16:33:38 -0800 (PST) Received-SPF: pass (google.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.120 as permitted sender) client-ip=192.55.52.120; Authentication-Results: mx.google.com; spf=pass (google.com: domain of rick.p.edgecombe@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=rick.p.edgecombe@intel.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Jan 2019 16:33:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,488,1539673200"; d="scan'208";a="292166027" Received: from rpedgeco-desk5.jf.intel.com ([10.54.75.79]) by orsmga005.jf.intel.com with ESMTP; 16 Jan 2019 16:33:35 -0800 From: Rick Edgecombe To: Andy Lutomirski , Ingo Molnar Cc: linux-kernel@vger.kernel.org, x86@kernel.org, hpa@zytor.com, Thomas Gleixner , Borislav Petkov , Nadav Amit , Dave Hansen , Peter Zijlstra , linux_dti@icloud.com, linux-integrity@vger.kernel.org, linux-security-module@vger.kernel.org, akpm@linux-foundation.org, kernel-hardening@lists.openwall.com, linux-mm@kvack.org, will.deacon@arm.com, ard.biesheuvel@linaro.org, kristen@linux.intel.com, deneen.t.dock@intel.com, Nadav Amit , Kees Cook , Dave Hansen , Masami Hiramatsu , Rick Edgecombe Subject: [PATCH 06/17] x86/alternative: use temporary mm for text poking Date: Wed, 16 Jan 2019 16:32:48 -0800 Message-Id: <20190117003259.23141-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190117003259.23141-1-rick.p.edgecombe@intel.com> References: <20190117003259.23141-1-rick.p.edgecombe@intel.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Nadav Amit text_poke() can potentially compromise the security as it sets temporary PTEs in the fixmap. These PTEs might be used to rewrite the kernel code from other cores accidentally or maliciously, if an attacker gains the ability to write onto kernel memory. Moreover, since remote TLBs are not flushed after the temporary PTEs are removed, the time-window in which the code is writable is not limited if the fixmap PTEs - maliciously or accidentally - are cached in the TLB. To address these potential security hazards, we use a temporary mm for patching the code. Finally, text_poke() is also not conservative enough when mapping pages, as it always tries to map 2 pages, even when a single one is sufficient. So try to be more conservative, and do not map more than needed. Cc: Andy Lutomirski Cc: Kees Cook Cc: Peter Zijlstra Cc: Dave Hansen Cc: Masami Hiramatsu Signed-off-by: Nadav Amit Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/fixmap.h | 2 - arch/x86/kernel/alternative.c | 109 +++++++++++++++++++++++++++------- arch/x86/xen/mmu_pv.c | 2 - 3 files changed, 87 insertions(+), 26 deletions(-) diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 50ba74a34a37..9da8cccdf3fb 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -103,8 +103,6 @@ enum fixed_addresses { #ifdef CONFIG_PARAVIRT FIX_PARAVIRT_BOOTMAP, #endif - FIX_TEXT_POKE1, /* reserve 2 pages for text_poke() */ - FIX_TEXT_POKE0, /* first page is last, because allocation is backward */ #ifdef CONFIG_X86_INTEL_MID FIX_LNW_VRTC, #endif diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 57fdde308bb6..8fc4685f3117 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -683,41 +684,105 @@ __ro_after_init unsigned long poking_addr; static void *__text_poke(void *addr, const void *opcode, size_t len) { + bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE; + temporary_mm_state_t prev; + struct page *pages[2] = {NULL}; unsigned long flags; - char *vaddr; - struct page *pages[2]; - int i; + pte_t pte, *ptep; + spinlock_t *ptl; /* - * While boot memory allocator is runnig we cannot use struct - * pages as they are not yet initialized. + * While boot memory allocator is running we cannot use struct pages as + * they are not yet initialized. */ BUG_ON(!after_bootmem); if (!core_kernel_text((unsigned long)addr)) { pages[0] = vmalloc_to_page(addr); - pages[1] = vmalloc_to_page(addr + PAGE_SIZE); + if (cross_page_boundary) + pages[1] = vmalloc_to_page(addr + PAGE_SIZE); } else { pages[0] = virt_to_page(addr); WARN_ON(!PageReserved(pages[0])); - pages[1] = virt_to_page(addr + PAGE_SIZE); + if (cross_page_boundary) + pages[1] = virt_to_page(addr + PAGE_SIZE); } - BUG_ON(!pages[0]); + BUG_ON(!pages[0] || (cross_page_boundary && !pages[1])); + local_irq_save(flags); - set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0])); - if (pages[1]) - set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1])); - vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0); - memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len); - clear_fixmap(FIX_TEXT_POKE0); - if (pages[1]) - clear_fixmap(FIX_TEXT_POKE1); - local_flush_tlb(); - sync_core(); - /* Could also do a CLFLUSH here to speed up CPU recovery; but - that causes hangs on some VIA CPUs. */ - for (i = 0; i < len; i++) - BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]); + + /* + * The lock is not really needed, but this allows to avoid open-coding. + */ + ptep = get_locked_pte(poking_mm, poking_addr, &ptl); + + /* + * This must not fail; preallocated in poking_init(). + */ + VM_BUG_ON(!ptep); + + pte = mk_pte(pages[0], PAGE_KERNEL); + set_pte_at(poking_mm, poking_addr, ptep, pte); + + if (cross_page_boundary) { + pte = mk_pte(pages[1], PAGE_KERNEL); + set_pte_at(poking_mm, poking_addr + PAGE_SIZE, ptep + 1, pte); + } + + /* + * Loading the temporary mm behaves as a compiler barrier, which + * guarantees that the PTE will be set at the time memcpy() is done. + */ + prev = use_temporary_mm(poking_mm); + + kasan_disable_current(); + memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len); + kasan_enable_current(); + + /* + * Ensure that the PTE is only cleared after the instructions of memcpy + * were issued by using a compiler barrier. + */ + barrier(); + + pte_clear(poking_mm, poking_addr, ptep); + + /* + * __flush_tlb_one_user() performs a redundant TLB flush when PTI is on, + * as it also flushes the corresponding "user" address spaces, which + * does not exist. + * + * Poking, however, is already very inefficient since it does not try to + * batch updates, so we ignore this problem for the time being. + * + * Since the PTEs do not exist in other kernel address-spaces, we do + * not use __flush_tlb_one_kernel(), which when PTI is on would cause + * more unwarranted TLB flushes. + * + * There is a slight anomaly here: the PTE is a supervisor-only and + * (potentially) global and we use __flush_tlb_one_user() but this + * should be fine. + */ + __flush_tlb_one_user(poking_addr); + if (cross_page_boundary) { + pte_clear(poking_mm, poking_addr + PAGE_SIZE, ptep + 1); + __flush_tlb_one_user(poking_addr + PAGE_SIZE); + } + + /* + * Loading the previous page-table hierarchy requires a serializing + * instruction that already allows the core to see the updated version. + * Xen-PV is assumed to serialize execution in a similar manner. + */ + unuse_temporary_mm(prev); + + pte_unmap_unlock(ptep, ptl); + /* + * If the text doesn't match what we just wrote; something is + * fundamentally screwy, there's nothing we can really do about that. + */ + BUG_ON(memcmp(addr, opcode, len)); + local_irq_restore(flags); return addr; } diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 0f4fe206dcc2..82b181fcefe5 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2319,8 +2319,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) #elif defined(CONFIG_X86_VSYSCALL_EMULATION) case VSYSCALL_PAGE: #endif - case FIX_TEXT_POKE0: - case FIX_TEXT_POKE1: /* All local page mappings */ pte = pfn_pte(phys, prot); break;