From patchwork Thu Jan 10 21:09:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Khalid Aziz X-Patchwork-Id: 10756913 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B6B63159A for ; Thu, 10 Jan 2019 21:11:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A5E4829BF7 for ; Thu, 10 Jan 2019 21:11:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9848B29BF6; Thu, 10 Jan 2019 21:11:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E0D0A29BF1 for ; Thu, 10 Jan 2019 21:11:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F29D8E000F; Thu, 10 Jan 2019 16:10:53 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 12BFE8E0008; Thu, 10 Jan 2019 16:10:53 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0D748E000F; Thu, 10 Jan 2019 16:10:52 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id A55CC8E0008 for ; Thu, 10 Jan 2019 16:10:52 -0500 (EST) Received: by mail-pl1-f200.google.com with SMTP id ay11so6918943plb.20 for ; Thu, 10 Jan 2019 13:10:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references:in-reply-to:references; bh=749tAVv48vAA0ttTJdCoUpxycWrvDUNoF9AI2xBMv4U=; b=MGvh8gnw8Ax3Gv6eetEquOGssQUff0yACpodJqIJyUulOp+Al9qz8S+w0vJTdEZtQv g4Q12VQ8uCOh/jDOzn+C/wKLK4S0hOEzlgaJKGCAjXhwVVHVoqKv0zmoro/P7wXTvWOL u7FO+EbZc1iVrPvvXOhnGzIsAB8/hyhnNn07zgXElfpwvfTvErVgRS+rmFNzw72zW2zL 6g4yiOmXbywcnpcg4CEitcmZCx1pLQHZcAEBjHO3KZcE2xaNTK7nqVxkhRXBgJ7zMUrq kHutJjhbwDpb3zHFGh9vZw/IZdSZMmnrsjkRzXbzB6MmgidYRQ9FM21S4FMmVcVqgx3t ayNg== X-Gm-Message-State: AJcUukem6nWUu1/TSIASeM35VfdKnQlEDEqZVL/dn1kJJCGlAAt2/14K a7eav4B6LqfCjnJt4/rsxpk6sQ6u0Xo6sb32nl5X7p7pnj95eAGiJE/ZZzjInMERrKPR9RevyUX OPLueCXdTLU1dq64HmsMdKxpmzuZquMrNyOZCXqJlSlqlsNu4tH3wmMovr25cisNQqQ== X-Received: by 2002:a63:d747:: with SMTP id w7mr10597585pgi.360.1547154652313; Thu, 10 Jan 2019 13:10:52 -0800 (PST) X-Google-Smtp-Source: ALg8bN6x5GZXsW5tjrkJVh5iOfbN6HCrl7ZwfpNstwHHjvesjM+4Jr6ThrzoE5PXjkz9yQrIVfBM X-Received: by 2002:a63:d747:: with SMTP id w7mr10597517pgi.360.1547154651146; Thu, 10 Jan 2019 13:10:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547154651; cv=none; d=google.com; s=arc-20160816; b=Enj6qCPGbzbA27Z39tD+uwHfnHF+fs2UK6E+eH/SJADg+H/CxFCnZfkaE6RvtxXbby VhowFrHvUH0pAo79KQfLcUNePZCW1RBT2eUMm0ujqBh0HAaj/f2gv8k3UlEyDq6Pl1iU 9VlovV4bOaUun5g8Ok1DnjqiocZQm42MBj/MliRaPlVmvhurCn+0VfX+iQeOqdZ1DKYG KRuZmHrXTNN2FKsfgIhdhbISPrH0unRXhVSWVUM1tk4Q8WVyWJzKZOn3/xF3GJs9i//D xMk/izf6xNU5W1muItnEMkftuVP/vj+BgDXfZ88GiBPcDa70pjmxBnY+BajyH4FHKmEJ comA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=749tAVv48vAA0ttTJdCoUpxycWrvDUNoF9AI2xBMv4U=; b=CmSejISC/GeGWhVM/GweJYUc7282Y8XewCkQ39ItVFsBkRxQ4KSdHab/P21Hx7K81h B+KmvkFb6hq/hIfM6BTa078Gew/V8zRNzW7UZgRm4qwWmgwjKUhFib0ZIp0Ji39UbyCf D/Rvl4LIxpn4y6zLmY6tXESfSIzNmm67IIQGQBvIrsllcz/aSLSU7dbtWpU9TgtNQb7x iMp0TmwnrDiDvJfSeqy1EOVtQzdV3YSer3S80iG2EB0bXEXqg+ywQNZAzPO/2MO/hCNT u378hVXnAfd+uV97ND9PvEittEuzRi07aU0u31g78Ek0cWb4v8ta1Nmw2po7UWEAznC6 L9KA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=YXAMLnpt; spf=pass (google.com: domain of khalid.aziz@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=khalid.aziz@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from userp2130.oracle.com (userp2130.oracle.com. [156.151.31.86]) by mx.google.com with ESMTPS id d7si71395773pfo.108.2019.01.10.13.10.50 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Jan 2019 13:10:51 -0800 (PST) Received-SPF: pass (google.com: domain of khalid.aziz@oracle.com designates 156.151.31.86 as permitted sender) client-ip=156.151.31.86; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=YXAMLnpt; spf=pass (google.com: domain of khalid.aziz@oracle.com designates 156.151.31.86 as permitted sender) smtp.mailfrom=khalid.aziz@oracle.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id x0AL8jsP041385; Thu, 10 Jan 2019 21:10:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2018-07-02; bh=749tAVv48vAA0ttTJdCoUpxycWrvDUNoF9AI2xBMv4U=; b=YXAMLnptH6eKpRUf8FRROq+aTtAfiMP1nh2RLmNsczXSkzjuqPCrImMehySWFczDHX6s mrzIO3Aq2NqgpeFp8OMh1bjZfM1S3YoMoMTMWGk8mVqEp/NNjL910+1obXG2GoB0Xv7Z vRd2u4+Xde5Yu4aVJfOFvG3ZoEIMGE7MCSEU6LhP30v7C2IHioVInos1OPt3z4IjnjMD gCd+XdrcIrQ6o4rwk+OyQMQXSFhvuttBdDzmnhGNLYYMEQEULtpZsLqgAPYioWXBWnDP 1g1OYin5eQU5noDxdiZA1toYs5qWuFxzY93CfQQ9XA6Zh5o2VyiUnumYS1U1i/mZYa1F ZQ== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2ptm0uhnvu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Jan 2019 21:10:46 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x0ALAjsi013222 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Jan 2019 21:10:45 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x0ALAjCf023262; Thu, 10 Jan 2019 21:10:45 GMT Received: from concerto.internal (/24.9.64.241) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 10 Jan 2019 13:10:44 -0800 From: Khalid Aziz To: juergh@gmail.com, tycho@tycho.ws, jsteckli@amazon.de, ak@linux.intel.com, torvalds@linux-foundation.org, liran.alon@oracle.com, keescook@google.com, konrad.wilk@oracle.com Cc: Khalid Aziz , deepa.srinivasan@oracle.com, chris.hyser@oracle.com, tyhicks@canonical.com, dwmw@amazon.co.uk, andrew.cooper3@citrix.com, jcm@redhat.com, boris.ostrovsky@oracle.com, kanth.ghatraju@oracle.com, joao.m.martins@oracle.com, jmattson@google.com, pradeep.vincent@oracle.com, john.haxby@oracle.com, tglx@linutronix.de, kirill.shutemov@linux.intel.com, hch@lst.de, steven.sistare@oracle.com, kernel-hardening@lists.openwall.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v7 16/16] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only) Date: Thu, 10 Jan 2019 14:09:48 -0700 Message-Id: X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9132 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901100164 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP XPFO flushes kernel space TLB entries for pages that are now mapped in userspace on not only the current CPU but also all other CPUs. If the number of TLB entries to flush exceeds tlb_single_page_flush_ceiling, this results in entire TLB neing flushed on all CPUs. A malicious userspace app can exploit the dual mapping of a physical page caused by physmap only on the CPU it is running on. There is no good reason to incur the very high cost of TLB flush on CPUs that may never run the malicious app or do not have any TLB entries for the malicious app. The cost of full TLB flush goes up dramatically on machines with high core count. This patch flushes relevant TLB entries for current process or entire TLB depending upon number of entries for the current CPU and posts a pending TLB flush on all other CPUs when a page is unmapped from kernel space and mapped in userspace. This pending TLB flush is posted for each task separately and TLB is flushed on a CPU when a task is scheduled on it that has a pending TLB flush posted for that CPU. This patch does two things - (1) it potentially aggregates multiple TLB flushes into one, and (2) it avoids TLB flush on CPUs that never run the task that caused a TLB flush. This has very significant impact especially on machines with large core counts. To illustrate this, kernel was compiled with -j on two classes of machines - a server with high core count and large amount of memory, and a desktop class machine with more modest specs. System time from "make -j" from vanilla 4.20 kernel, 4.20 with XPFO patches before applying this patch and after applying this patch are below: Hardware: 96-core Intel Xeon Platinum 8160 CPU @ 2.10GHz, 768 GB RAM make -j60 all 4.20 915.183s 4.19+XPFO 24129.354s 26.366x 4.19+XPFO+Deferred flush 1216.987s 1.330xx Hardware: 4-core Intel Core i5-3550 CPU @ 3.30GHz, 8G RAM make -j4 all 4.20 607.671s 4.19+XPFO 1588.646s 2.614x 4.19+XPFO+Deferred flush 794.473s 1.307xx This patch could use more optimization. For instance, it posts a pending full TLB flush for other CPUs even when number of TLB entries being flushed does not exceed tlb_single_page_flush_ceiling. Batching more TLB entry flushes, as was suggested for earlier version of these patches, can help reduce these cases. This same code should be implemented for other architectures as well once finalized. Signed-off-by: Khalid Aziz --- arch/x86/include/asm/tlbflush.h | 1 + arch/x86/mm/tlb.c | 27 +++++++++++++++++++++++++++ arch/x86/mm/xpfo.c | 2 +- include/linux/sched.h | 9 +++++++++ 4 files changed, 38 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index f4204bf377fc..92d23629d01d 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -561,6 +561,7 @@ extern void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables); extern void flush_tlb_kernel_range(unsigned long start, unsigned long end); +extern void xpfo_flush_tlb_kernel_range(unsigned long start, unsigned long end); static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a) { diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 03b6b4c2238d..b04a501c850b 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -319,6 +319,15 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, __flush_tlb_all(); } #endif + + /* If there is a pending TLB flush for this CPU due to XPFO + * flush, do it now. + */ + if (tsk && cpumask_test_and_clear_cpu(cpu, &tsk->pending_xpfo_flush)) { + count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED); + __flush_tlb_all(); + } + this_cpu_write(cpu_tlbstate.is_lazy, false); /* @@ -801,6 +810,24 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) } } +void xpfo_flush_tlb_kernel_range(unsigned long start, unsigned long end) +{ + + /* Balance as user space task's flush, a bit conservative */ + if (end == TLB_FLUSH_ALL || + (end - start) > tlb_single_page_flush_ceiling << PAGE_SHIFT) { + do_flush_tlb_all(NULL); + } else { + struct flush_tlb_info info; + + info.start = start; + info.end = end; + do_kernel_range_flush(&info); + } + cpumask_setall(¤t->pending_xpfo_flush); + cpumask_clear_cpu(smp_processor_id(), ¤t->pending_xpfo_flush); +} + void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { struct flush_tlb_info info = { diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c index bcdb2f2089d2..5aa17cb2c813 100644 --- a/arch/x86/mm/xpfo.c +++ b/arch/x86/mm/xpfo.c @@ -110,7 +110,7 @@ inline void xpfo_flush_kernel_tlb(struct page *page, int order) return; } - flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size); + xpfo_flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * size); } /* Convert a user space virtual address to a physical address. diff --git a/include/linux/sched.h b/include/linux/sched.h index 291a9bd5b97f..ba298be3b5a1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1206,6 +1206,15 @@ struct task_struct { unsigned long prev_lowest_stack; #endif + /* + * When a full TLB flush is needed to flush stale TLB entries + * for pages that have been mapped into userspace and unmapped + * from kernel space, this TLB flush will be delayed until the + * task is scheduled on that CPU. Keep track of CPUs with + * pending full TLB flush forced by xpfo. + */ + cpumask_t pending_xpfo_flush; + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct.