From patchwork Mon Feb 3 20:17:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Arcangeli X-Patchwork-Id: 11363389 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 94DB41398 for ; Mon, 3 Feb 2020 20:17:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 589EF2084E for ; Mon, 3 Feb 2020 20:17:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="W4nuXczM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 589EF2084E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E429A6B0006; Mon, 3 Feb 2020 15:17:57 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id DCDC46B0007; Mon, 3 Feb 2020 15:17:57 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D09D36B0008; Mon, 3 Feb 2020 15:17:57 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id B25A46B0006 for ; Mon, 3 Feb 2020 15:17:57 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4DB93180AD804 for ; Mon, 3 Feb 2020 20:17:57 +0000 (UTC) X-FDA: 76449926994.06.title50_b2942f842a1a X-Spam-Summary: 1,0,0,,d41d8cd98f00b204,aarcange@redhat.com,:will@kernel.org:catalin.marinas@arm.com:jcm@jonmasters.org:aquini@redhat.com:msalter@redhat.com:linux-kernel@vger.kernel.org::linux-arm-kernel@lists.infradead.org,RULES_HIT:30003:30012:30026:30046:30054:30056:30070:30075,0,RBL:207.211.31.120:@redhat.com:.lbl8.mailshell.net-62.18.0.100 66.10.201.10,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:ft,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:30,LUA_SUMMARY:none X-HE-Tag: title50_b2942f842a1a X-Filterd-Recvd-Size: 5121 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Mon, 3 Feb 2020 20:17:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580761075; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=RutAFnjTPWJQYvSXakp1wH+YCc3aiDR209+0rmd2tsA=; b=W4nuXczMPsnoUF6Vv4OG1sEtME2Zx7grV1D26CQVZ5svPgq08DCL4tOJxcPUned6IJmNTq a1CyuPTPXFo2Mj5eb0HQlQNec38PeOUkX9mGf5BUM+ebQkwyqfXktAqRXGJYct3cVOmDu3 9jgY66difSio84NM6Sv+LdZSdyu+9jY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-413-K6SgvcyMNI-hwGVxcb2tDA-1; Mon, 03 Feb 2020 15:17:50 -0500 X-MC-Unique: K6SgvcyMNI-hwGVxcb2tDA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BF651107ACCD; Mon, 3 Feb 2020 20:17:48 +0000 (UTC) Received: from mail (ovpn-120-67.rdu2.redhat.com [10.10.120.67]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0F15819C58; Mon, 3 Feb 2020 20:17:46 +0000 (UTC) From: Andrea Arcangeli To: Will Deacon , Catalin Marinas , Jon Masters , Rafael Aquini , Mark Salter Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Subject: [RFC] [PATCH 0/2] arm64: tlb: skip tlbi broadcast for single threaded TLB flushes Date: Mon, 3 Feb 2020 15:17:43 -0500 Message-Id: <20200203201745.29986-1-aarcange@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello everyone, I've been testing the arm64 ARMv8 tlbi broadcast instruction and it seems it doesn't scale in SMP, which opens the door for using similar tricks to what alpha, ia64, mips, powerpc, sh, sparc are doing to optimize TLB flushes for single threaded processes. This should be even more beneficial in NUMA or multi socket systems where the "ASID" and "vaddr" information has to cross a longer distance before the tlbi broadcast instruction can be retired. This mm_users logic seems non standard across different arches: every arch does it in its own way. Not only the implementation is different, but different arches are also trying to optimize different cases through the mm_users checks in the arch code: 1) avoiding remote TLB flushes with mm_users <= 1 2) avoiding even local TLB flushes during exit_mmap->unmap_vmas when mm_users == 0 For now I only tried to implement 1) on arm64, but I'm left wondering which other arches can achieve 2) and in turn which kernel code could write to the very userland virtual addresses being unmapped during exit_mmap, that would strictly require their flushing using the tlb gather mechanism. This is just a RFC to know if this is would be a viable optimization. A basic microbenchmark is included in the commit header of the patch. Thanks, Andrea Andrea Arcangeli (2): mm: use_mm: fix for arches checking mm_users to optimize TLB flushes arm64: tlb: skip tlbi broadcast for single threaded TLB flushes arch/arm64/include/asm/efi.h | 2 +- arch/arm64/include/asm/mmu.h | 3 +- arch/arm64/include/asm/mmu_context.h | 10 +-- arch/arm64/include/asm/tlbflush.h | 91 +++++++++++++++++++++++++++- arch/arm64/mm/context.c | 13 +++- mm/mmu_context.c | 2 + 6 files changed, 111 insertions(+), 10 deletions(-) Some examples of the scattered status of 2) follows: ia64: == flush_tlb_mm (struct mm_struct *mm) { if (!mm) return; set_bit(mm->context, ia64_ctx.flushmap); mm->context = 0; if (atomic_read(&mm->mm_users) == 0) return; /* happens as a result of exit_mmap() */ [..] == sparc: == void flush_tlb_all(void) { /* * Don't bother flushing if this address space is about to be * destroyed. */ if (atomic_read(¤t->mm->mm_users) == 0) return; [..] static void fix_range(struct mm_struct *mm, unsigned long start_addr, unsigned long end_addr, int force) { /* * Don't bother flushing if this address space is about to be * destroyed. */ if (atomic_read(&mm->mm_users) == 0) return; [..] == arc: == noinline void local_flush_tlb_mm(struct mm_struct *mm) { /* * Small optimisation courtesy IA64 * flush_mm called during fork,exit,munmap etc, multiple times as well. * Only for fork( ) do we need to move parent to a new MMU ctxt, * all other cases are NOPs, hence this check. */ if (atomic_read(&mm->mm_users) == 0) return; [..] ==