From patchwork Thu Jan 10 21:09:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Khalid Aziz X-Patchwork-Id: 10756877 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 880E613B5 for ; Thu, 10 Jan 2019 21:10:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 65BAF29B68 for ; Thu, 10 Jan 2019 21:10:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 58D4929B6C; Thu, 10 Jan 2019 21:10:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7602929B8A for ; Thu, 10 Jan 2019 21:10:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65BC48E0002; Thu, 10 Jan 2019 16:10:19 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 60CE38E0001; Thu, 10 Jan 2019 16:10:19 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D3468E0002; Thu, 10 Jan 2019 16:10:19 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 0C0B68E0001 for ; Thu, 10 Jan 2019 16:10:19 -0500 (EST) Received: by mail-pf1-f197.google.com with SMTP id b17so8629662pfc.11 for ; Thu, 10 Jan 2019 13:10:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id; bh=JDJoZD2EIKANFxPEYRygSfFIJ9iOJNE0dGwXm4NZjdk=; b=ZoldExiG7F2rYZR5NmYLzAWeSsQpya4JOMTL96TXOR3pUVtxcHxJ/j7P7Unq/QEzNm VOPmEUhwoDTdwr213rhaE3Am5HeSHYTq4UAsY4IauM0DncNO7/wPM3uOSDtM1NzkY6Dq iQ7x4gb/F45JgJFjFFb6SEUgNUrDDlZ6ElKBCFfoVkJgOwIaFeJLqluDgK5Djq9lzKZH 3b9dxBduh6/f4rvVxloRIk2UaeRE7dvyvyO2SWKs9EGx0aYif5TYL+axtueLowHC6u02 mYKfQouBDlZ+atFLUvWvMPSEWe9l7AKxuUm3zSXy22YllODJVDNexQQii/qZZGWwzE5g h3nQ== X-Gm-Message-State: AJcUukdUAIi5mGvWzeMNowLIOn1mPei18TWFwA3D9jxyvQd09v+c7MDB 89mN0VgSQKejfqiMKMyS8d9HDJqMdt36gE01gajQFhwwMMoJTiUvN9NG6v09FrRAsPEbDeCh219 1WhznFTNEdCjzR35GSxuFXuOAzas9vznWdG1elmztdeseUhtWVswIVV448T+QhbuA0g== X-Received: by 2002:a62:99dd:: with SMTP id t90mr11676784pfk.179.1547154618620; Thu, 10 Jan 2019 13:10:18 -0800 (PST) X-Google-Smtp-Source: ALg8bN4TKIcW8CSoqmf4ZONeOD3idS0kCMPJHTF/H3ovX9iALRe/Gu0UoR0qp3Td0YOB5di4nSDW X-Received: by 2002:a62:99dd:: with SMTP id t90mr11676712pfk.179.1547154617405; Thu, 10 Jan 2019 13:10:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547154617; cv=none; d=google.com; s=arc-20160816; b=n7tQ0SUXEGc38WlVhQUqzFT0jAVGqO2YiHLbK5LcIySH/ANQosLw+CiH01hV5LSN/l TAKwNyJS3Ro+JLrmnSjM0CWef4dsJMLD/EcAiP1py9puRfvqs/IUla5zritAT8KquCBI TLLf694PhQYSGifTGxXlHL8vjuyMUfgukc4/Agwg4uXoPc/mx2y2qVJDuJptVIkzUeso P5ryDfnptGc+Uf2XSNmkSKSRDOtUZdgMIzqBeHgLbLdpM3JOWDYVNb0lpLEOvZR99DmF yNlVoOq7c9q4gk2jbnLmJzjzIgQEX9dEHQ3XEtOqdZnRtHdHmFFStMGOM2alLAOinIW4 B7ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:date:subject:cc:to:from:dkim-signature; bh=JDJoZD2EIKANFxPEYRygSfFIJ9iOJNE0dGwXm4NZjdk=; b=tmCr+yClsrGBIaoBQj6FFmq9q3sdTRsXb53bJbJAYpvbYXjY+SgG5B5kIFrLAA/nkQ AegmAtSN0HAce1o4geBTLf8XNr6EZ/LZ0TH0SZ27vWbMghfw+SbLQEr3rdUkzccedUCD 58JiUnRdTNJwQSO8UAMo+jpP2LmBMox+Xo7yO79wiIgfVR4dCLj/lUMsuAliRT3yr7sv q+BHX5vhNo24/wGgdFYE1ygujhk4YuWO+ZGGFbvfS9gqgAEyK+E7wDtLl3AX3hsItcWE +Hf35i/bLK0ycvGCEi2whF5Fs5v0t+0OEZoPy/yy4xivh90RqdvaSb7Uw5WlYxldZwLu fZFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=2J5Tv54q; spf=pass (google.com: domain of khalid.aziz@oracle.com designates 141.146.126.79 as permitted sender) smtp.mailfrom=khalid.aziz@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from aserp2130.oracle.com (aserp2130.oracle.com. [141.146.126.79]) by mx.google.com with ESMTPS id a3si21122429pld.252.2019.01.10.13.10.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Jan 2019 13:10:17 -0800 (PST) Received-SPF: pass (google.com: domain of khalid.aziz@oracle.com designates 141.146.126.79 as permitted sender) client-ip=141.146.126.79; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=2J5Tv54q; spf=pass (google.com: domain of khalid.aziz@oracle.com designates 141.146.126.79 as permitted sender) smtp.mailfrom=khalid.aziz@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id x0AL8rRw187057; Thu, 10 Jan 2019 21:10:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id; s=corp-2018-07-02; bh=JDJoZD2EIKANFxPEYRygSfFIJ9iOJNE0dGwXm4NZjdk=; b=2J5Tv54qxbxR5QvLQqBI3iDhoLfLGdcO5W8crUhx7wlVqhiGV3K8pAkk2SPaFpx4lyUP +T3gNZ5GivaWCpAlVQB/O0dxfnjQAhbdSsnkCXK1QKLy3j6s7jqQnbr7xRKHa02w2HMR 56EIJChxZYAwVPjLn+d2XK/unU+fdVvftO4adTLkjMecycFP9mSkGkw+DEyLq8zUwBPe ju52rBo8QJHBKacZ1hm+RUgelbX2H6JGnXYh5c79hCz4nC7nUojzTzJ7HCvEjgFoc8Fr 5Q1AgGJDIS+kF0/IM5w+sjf32pN+C5TkzKlQaDAa7+s3PwwM+fGFLEICjSiQWNDo7txv 6Q== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2130.oracle.com with ESMTP id 2ptj3e9tfg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Jan 2019 21:10:09 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x0ALA81p010006 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Jan 2019 21:10:08 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x0ALA5cV003581; Thu, 10 Jan 2019 21:10:05 GMT Received: from concerto.internal (/24.9.64.241) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 10 Jan 2019 13:10:04 -0800 From: Khalid Aziz To: juergh@gmail.com, tycho@tycho.ws, jsteckli@amazon.de, ak@linux.intel.com, torvalds@linux-foundation.org, liran.alon@oracle.com, keescook@google.com, konrad.wilk@oracle.com Cc: Khalid Aziz , deepa.srinivasan@oracle.com, chris.hyser@oracle.com, tyhicks@canonical.com, dwmw@amazon.co.uk, andrew.cooper3@citrix.com, jcm@redhat.com, boris.ostrovsky@oracle.com, kanth.ghatraju@oracle.com, joao.m.martins@oracle.com, jmattson@google.com, pradeep.vincent@oracle.com, john.haxby@oracle.com, tglx@linutronix.de, kirill.shutemov@linux.intel.com, hch@lst.de, steven.sistare@oracle.com, kernel-hardening@lists.openwall.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v7 00/16] Add support for eXclusive Page Frame Ownership Date: Thu, 10 Jan 2019 14:09:32 -0700 Message-Id: X-Mailer: git-send-email 2.17.1 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9132 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901100164 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP I am continuing to build on the work Juerg, Tycho and Julian have done on XPFO. After the last round of updates, we were seeing very significant performance penalties when stale TLB entries were flushed actively after an XPFO TLB update. Benchmark for measuring performance is kernel build using parallel make. To get full protection from ret2dir attackes, we must flush stale TLB entries. Performance penalty from flushing stale TLB entries goes up as the number of cores goes up. On a desktop class machine with only 4 cores, enabling TLB flush for stale entries causes system time for "make -j4" to go up by a factor of 2.614x but on a larger machine with 96 cores, system time with "make -j60" goes up by a factor of 26.366x! I have been working on reducing this performance penalty. I implemented a solution to reduce performance penalty and that has had large impact. When XPFO code flushes stale TLB entries, it does so for all CPUs on the system which may include CPUs that may not have any matching TLB entries or may never be scheduled to run the userspace task causing TLB flush. Problem is made worse by the fact that if number of entries being flushed exceeds tlb_single_page_flush_ceiling, it results in a full TLB flush on every CPU. A rogue process can launch a ret2dir attack only from a CPU that has dual mapping for its pages in physmap in its TLB. We can hence defer TLB flush on a CPU until a process that would have caused a TLB flush is scheduled on that CPU. I have added a cpumask to task_struct which is then used to post pending TLB flush on CPUs other than the one a process is running on. This cpumask is checked when a process migrates to a new CPU and TLB is flushed at that time. I measured system time for parallel make with unmodified 4.20 kernel, 4.20 with XPFO patches before this optimization and then again after applying this optimization. Here are the results: Hardware: 96-core Intel Xeon Platinum 8160 CPU @ 2.10GHz, 768 GB RAM make -j60 all 4.20 915.183s 4.20+XPFO 24129.354s 26.366x 4.20+XPFO+Deferred flush 1216.987s 1.330xx Hardware: 4-core Intel Core i5-3550 CPU @ 3.30GHz, 8G RAM make -j4 all 4.20 607.671s 4.20+XPFO 1588.646s 2.614x 4.20+XPFO+Deferred flush 794.473s 1.307xx 30+% overhead is still very high and there is room for improvement. Dave Hansen had suggested batch updating TLB entries and Tycho had created an initial implementation but I have not been able to get that to work correctly. I am still working on it and I suspect we will see a noticeable improvement in performance with that. In the code I added, I post a pending full TLB flush to all other CPUs even when number of TLB entries being flushed on current CPU does not exceed tlb_single_page_flush_ceiling. There has to be a better way to do this. I just haven't found an efficient way to implemented delayed limited TLB flush on other CPUs. I am not entirely sure if switch_mm_irqs_off() is indeed the right place to perform the pending TLB flush for a CPU. Any feedback on that will be very helpful. Delaying full TLB flushes on other CPUs seems to help tremendously, so if there is a better way to implement the same thing than what I have done in patch 16, I am open to ideas. Performance with this patch set is good enough to use these as starting point for further refinement before we merge it into main kernel, hence RFC. Since not flushing stale TLB entries creates a false sense of security, I would recommend making TLB flush mandatory and eliminate the "xpfotlbflush" kernel parameter (patch "mm, x86: omit TLB flushing by default for XPFO page table modifications"). What remains to be done beyond this patch series: 1. Performance improvements 2. Remove xpfotlbflush parameter 3. Re-evaluate the patch "arm64/mm: Add support for XPFO to swiotlb" from Juerg. I dropped it for now since swiotlb code for ARM has changed a lot in 4.20. 4. Extend the patch "xpfo, mm: Defer TLB flushes for non-current CPUs" to other architectures besides x86. --------------------------------------------------------- Juerg Haefliger (5): mm, x86: Add support for eXclusive Page Frame Ownership (XPFO) swiotlb: Map the buffer if it was unmapped by XPFO arm64/mm: Add support for XPFO arm64/mm, xpfo: temporarily map dcache regions lkdtm: Add test for XPFO Julian Stecklina (4): mm, x86: omit TLB flushing by default for XPFO page table modifications xpfo, mm: remove dependency on CONFIG_PAGE_EXTENSION xpfo, mm: optimize spinlock usage in xpfo_kunmap EXPERIMENTAL: xpfo, mm: optimize spin lock usage in xpfo_kmap Khalid Aziz (2): xpfo, mm: Fix hang when booting with "xpfotlbflush" xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only) Tycho Andersen (5): mm: add MAP_HUGETLB support to vm_mmap x86: always set IF before oopsing from page fault xpfo: add primitives for mapping underlying memory arm64/mm: disable section/contiguous mappings if XPFO is enabled mm: add a user_virt_to_phys symbol .../admin-guide/kernel-parameters.txt | 2 + arch/arm64/Kconfig | 1 + arch/arm64/mm/Makefile | 2 + arch/arm64/mm/flush.c | 7 + arch/arm64/mm/mmu.c | 2 +- arch/arm64/mm/xpfo.c | 58 ++++ arch/x86/Kconfig | 1 + arch/x86/include/asm/pgtable.h | 26 ++ arch/x86/include/asm/tlbflush.h | 1 + arch/x86/mm/Makefile | 2 + arch/x86/mm/fault.c | 10 + arch/x86/mm/pageattr.c | 23 +- arch/x86/mm/tlb.c | 27 ++ arch/x86/mm/xpfo.c | 171 ++++++++++++ drivers/misc/lkdtm/Makefile | 1 + drivers/misc/lkdtm/core.c | 3 + drivers/misc/lkdtm/lkdtm.h | 5 + drivers/misc/lkdtm/xpfo.c | 194 ++++++++++++++ include/linux/highmem.h | 15 +- include/linux/mm.h | 2 + include/linux/mm_types.h | 8 + include/linux/page-flags.h | 13 + include/linux/sched.h | 9 + include/linux/xpfo.h | 90 +++++++ include/trace/events/mmflags.h | 10 +- kernel/dma/swiotlb.c | 3 +- mm/Makefile | 1 + mm/mmap.c | 19 +- mm/page_alloc.c | 3 + mm/util.c | 32 +++ mm/xpfo.c | 247 ++++++++++++++++++ security/Kconfig | 29 ++ 32 files changed, 974 insertions(+), 43 deletions(-) create mode 100644 arch/arm64/mm/xpfo.c create mode 100644 arch/x86/mm/xpfo.c create mode 100644 drivers/misc/lkdtm/xpfo.c create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c