From patchwork Fri Jul 19 00:58:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 11049711 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 55AF7912 for ; Fri, 19 Jul 2019 01:00:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3F79428791 for ; Fri, 19 Jul 2019 01:00:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2D0CF287CF; Fri, 19 Jul 2019 01:00:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8CEFC28791 for ; Fri, 19 Jul 2019 01:00:39 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hoHEo-0004Hf-GG; Fri, 19 Jul 2019 00:59:06 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hoHEm-0004Ha-Iq for xen-devel@lists.xenproject.org; Fri, 19 Jul 2019 00:59:04 +0000 X-Inumbo-ID: 63b8b4fc-a9c0-11e9-b4aa-07b1dbb27dad Received: from mail-pf1-f195.google.com (unknown [209.85.210.195]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 63b8b4fc-a9c0-11e9-b4aa-07b1dbb27dad; Fri, 19 Jul 2019 00:58:58 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id y15so13391417pfn.5 for ; Thu, 18 Jul 2019 17:58:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=RWWNgv0WcWjfE9wWYHQhoeNKyBfGxLtvDshgZHYc1vE=; b=KRnqL8zV/3jmAFdYj5ALST7u0cIJRM3l7asOiaVGXSP8hpI9I2YvqY2Jzumpp7zFSe BJx19B7foCQ3kJPsZNnGKwFZtB8cnoX/svyh2NjmmcWAJQteEbcXO2bPXv4hO2AF6Fzq 7ScihtpmNSJTeghxs97YBvv+IF+8SuYcthCUWhUenUxVNQG1NbjT45XdRZRDgGRvkGjd XNH/feE0BBF0h8oilcPsZqsVSeYadxU+8qHkyEIMEs9T8xD819ITSLyKSV3HDZCDMjZ1 PdlrZ0zQIBnumshlj7qc1BNC0knTyjndLP/ycKKL7VobXTTfUBSfAXrkx/VJWPEHejH9 yNpg== X-Gm-Message-State: APjAAAUbbpGVRdLzseGL1AW0rhrGDwWO3ISW9gSzXuR7WsYjxxn1ZlrA zftPKYQWLM+b8e5pe3huNBM= X-Google-Smtp-Source: APXvYqxZoIKjz9ViGfUIEExjwwIbnjfGtWb7w9xiUnCbHyVRrBrLCxFCFJP5cS9IAYKOxxdelYaABg== X-Received: by 2002:a63:a346:: with SMTP id v6mr4814748pgn.57.1563497937499; Thu, 18 Jul 2019 17:58:57 -0700 (PDT) Received: from htb-2n-eng-dhcp405.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id j128sm14025166pfg.28.2019.07.18.17.58.55 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 18 Jul 2019 17:58:56 -0700 (PDT) From: Nadav Amit To: Andy Lutomirski , Dave Hansen Date: Thu, 18 Jul 2019 17:58:28 -0700 Message-Id: <20190719005837.4150-1-namit@vmware.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v3 0/9] x86: Concurrent TLB flushes X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Sasha Levin , linux-hyperv@vger.kernel.org, Stephen Hemminger , kvm@vger.kernel.org, Paolo Bonzini , Rik van Riel , Peter Zijlstra , Haiyang Zhang , x86@kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, xen-devel@lists.xenproject.org, Ingo Molnar , Nadav Amit , Josh Poimboeuf , Borislav Petkov , Thomas Gleixner , "K. Y. Srinivasan" , Boris Ostrovsky Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP [ Cover-letter is identical to v2, including benchmark results, excluding the change log. ] Currently, local and remote TLB flushes are not performed concurrently, which introduces unnecessary overhead - each INVLPG can take 100s of cycles. This patch-set allows TLB flushes to be run concurrently: first request the remote CPUs to initiate the flush, then run it locally, and finally wait for the remote CPUs to finish their work. In addition, there are various small optimizations to avoid unwarranted false-sharing and atomic operations. The proposed changes should also improve the performance of other invocations of on_each_cpu(). Hopefully, no one has relied on this behavior of on_each_cpu() that invoked functions first remotely and only then locally [Peter says he remembers someone might do so, but without further information it is hard to know how to address it]. Running sysbench on dax/ext4 w/emulated-pmem, write-cache disabled on 2-socket, 48-logical-cores (24+SMT) Haswell-X, 5 repetitions: sysbench fileio --file-total-size=3G --file-test-mode=rndwr \ --file-io-mode=mmap --threads=X --file-fsync-mode=fdatasync run Th. tip-jun28 avg (stdev) +patch-set avg (stdev) change --- --------------------- ---------------------- ------ 1 1267765 (14146) 1299253 (5715) +2.4% 2 1734644 (11936) 1799225 (19577) +3.7% 4 2821268 (41184) 2919132 (40149) +3.4% 8 4171652 (31243) 4376925 (65416) +4.9% 16 5590729 (24160) 5829866 (8127) +4.2% 24 6250212 (24481) 6522303 (28044) +4.3% 32 3994314 (26606) 4077543 (10685) +2.0% 48 4345177 (28091) 4417821 (41337) +1.6% (Note that on configurations with up to 24 threads numactl was used to set all threads on socket 1, which explains the drop in performance when going to 32 threads). Running the same benchmark with security mitigations disabled (PTI, Spectre, MDS): Th. tip-jun28 avg (stdev) +patch-set avg (stdev) change --- --------------------- ---------------------- ------ 1 1598896 (5174) 1607903 (4091) +0.5% 2 2109472 (17827) 2224726 (4372) +5.4% 4 3448587 (11952) 3668551 (30219) +6.3% 8 5425778 (29641) 5606266 (33519) +3.3% 16 6931232 (34677) 7054052 (27873) +1.7% 24 7612473 (23482) 7783138 (13871) +2.2% 32 4296274 (18029) 4283279 (32323) -0.3% 48 4770029 (35541) 4764760 (13575) -0.1% Presumably, PTI requires two invalidations of each mapping, which allows to get higher benefits from concurrency when PTI is on. At the same time, when mitigations are on, other overheads reduce the potential speedup. I tried to reduce the size of the code of the main patch, which required restructuring of the series. v2 -> v3: * Open-code the remote/local-flush decision code [Andy] * Fix hyper-v, Xen implementations [Andrew] * Fix redundant TLB flushes. v1 -> v2: * Removing the patches that Thomas took [tglx] * Adding hyper-v, Xen compile-tested implementations [Dave] * Removing UV [Andy] * Adding lazy optimization, removing inline keyword [Dave] * Restructuring patch-set RFCv2 -> v1: * Fix comment on flush_tlb_multi [Juergen] * Removing async invalidation optimizations [Andy] * Adding KVM support [Paolo] Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Boris Ostrovsky Cc: Dave Hansen Cc: Haiyang Zhang Cc: Ingo Molnar Cc: Josh Poimboeuf Cc: Juergen Gross Cc: "K. Y. Srinivasan" Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Rik van Riel Cc: Sasha Levin Cc: Stephen Hemminger Cc: Thomas Gleixner Cc: kvm@vger.kernel.org Cc: linux-hyperv@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: x86@kernel.org Cc: xen-devel@lists.xenproject.org Nadav Amit (9): smp: Run functions concurrently in smp_call_function_many() x86/mm/tlb: Remove reason as argument for flush_tlb_func_local() x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() x86/mm/tlb: Flush remote and local TLBs concurrently x86/mm/tlb: Privatize cpu_tlbstate x86/mm/tlb: Do not make is_lazy dirty for no reason cpumask: Mark functions as pure x86/mm/tlb: Remove UV special case x86/mm/tlb: Remove unnecessary uses of the inline keyword arch/x86/hyperv/mmu.c | 10 +- arch/x86/include/asm/paravirt.h | 6 +- arch/x86/include/asm/paravirt_types.h | 4 +- arch/x86/include/asm/tlbflush.h | 47 ++++----- arch/x86/include/asm/trace/hyperv.h | 2 +- arch/x86/kernel/kvm.c | 11 ++- arch/x86/kernel/paravirt.c | 2 +- arch/x86/mm/init.c | 2 +- arch/x86/mm/tlb.c | 133 ++++++++++++++++---------- arch/x86/xen/mmu_pv.c | 11 +-- include/linux/cpumask.h | 6 +- include/linux/smp.h | 27 ++++-- include/trace/events/xen.h | 2 +- kernel/smp.c | 133 ++++++++++++-------------- 14 files changed, 218 insertions(+), 178 deletions(-) Reviewed-by: Dave Hansen