From patchwork Thu Dec 7 15:03:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Ghiti X-Patchwork-Id: 13483400 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C55FC4167B for ; Thu, 7 Dec 2023 15:06:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 96FC66B0095; Thu, 7 Dec 2023 10:06:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F80D6B0096; Thu, 7 Dec 2023 10:06:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7714C6B0099; Thu, 7 Dec 2023 10:06:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 61CE36B0095 for ; Thu, 7 Dec 2023 10:06:58 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 378191A026A for ; Thu, 7 Dec 2023 15:06:58 +0000 (UTC) X-FDA: 81540349716.30.E931CC0 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) by imf26.hostedemail.com (Postfix) with ESMTP id 2387A140022 for ; Thu, 7 Dec 2023 15:06:55 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=j4rMe67f; spf=pass (imf26.hostedemail.com: domain of alexghiti@rivosinc.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=alexghiti@rivosinc.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701961616; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EWcoco88tut+CY63Y/9PXHuBFD03h05uf7rbyZFiPT4=; b=gH3NR5uj+R/Y9IxnoWRFEpQL7fhnbymd/wFX5b6n16/o8j4jyiJ+E5EIyoaHf5dcrABks1 rZ7qbH1Wm60WGrEq6TWi5hpithn/X6zPgcM7AqgUDdL49MRqakbA7lF63IcSKeLZqmLpud bnxo0V13qNqYiWVCk0CC/RIJ7Zm374U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701961616; a=rsa-sha256; cv=none; b=C3DtadWpX8ZFL9ox3n576WamP9DokCi9KqK6HYEk+7k1gvYr+ckayXxgcWxM9V/NP7n+Tb XLXvTN1RAmI/+NR+sAYhcF+l0i1xFo6it+NMggiTq0QODfB64fVXbGKW0eVuXuSgJHQrtM 4M/BeZkmM3yGNWaQ8kIFB2uzu9vYc6c= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=j4rMe67f; spf=pass (imf26.hostedemail.com: domain of alexghiti@rivosinc.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=alexghiti@rivosinc.com; dmarc=none Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-3316bb1303bso764163f8f.0 for ; Thu, 07 Dec 2023 07:06:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1701961614; x=1702566414; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EWcoco88tut+CY63Y/9PXHuBFD03h05uf7rbyZFiPT4=; b=j4rMe67fSOgDKDDv6uyuPumoup9xUSUor3nnIX1VvTdS+9FPEYNZaKmBys9F3qfuhz JFPmmcCA6O7jomRxjJVDMjJlD1mO1ejHlRvkQpmdp1xRc5CcNpjbgWeBpv8TUVo5/IkS 2JBNSWNRKb4BcEIwgJ4t8FamAZCfAjSXqeFrCZaM4t7NGwHbw/GLAwN7SagxG18aQq64 2Mw1zLbn5Y49QTjOyQ7EFtS0mx37nUT2S4Q7s5AnDKr8IO7qR529wTVh3nUWlRj+4/wv jAOdqf203+2uyX+hhA7rbV0Rvb70mOIXAv6UYXZKVusViGLdbq534Wg0utC/9WqJ1wGG j73Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701961614; x=1702566414; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EWcoco88tut+CY63Y/9PXHuBFD03h05uf7rbyZFiPT4=; b=ZQtjaHqhqOwGtOn7lZbyUn073BGS5vvzFwyo5RrJqRVqPQ+4YGCLb/UB6QsN8L/RMi 3gTZYu+ngymX5Z0pqntXQhQEA17n3X4fmF6lWj1rXrwUFtYQPs9aRLr90Dxckb/VIj4r 5psOnKrp1s0rN9a3h7gokHzCZuaTd0RpbQ79iID2zo0pmGHvNQuwCZo5CabSJSGBm6JD am5hI3aaZcQ/pNko9H7CkN5FUcDVDV/S+XEZJuux737XF+r3vGCnVhU1xR57AYP3FiLV g4m/Mlh1hv3y9/m8nS7MUsvqpwowYG8WyAb+ERqVhIVh/mP8f/dy6BHW6ME/+xXAO3WK ptyA== X-Gm-Message-State: AOJu0YzAHFy0rApuoGVxRgV+vti+Nww7ghgpYJ8S2oJLk9vX4Tf2PTck JE/DmGyewbnSQidhOj+jA5N9OA== X-Google-Smtp-Source: AGHT+IHTy8+IvueJsQn+KurIJffNNmKiqFVQtVzbm19iX0i45eZHV4tLMcBmveaKizCMPPNo4tYVVQ== X-Received: by 2002:a5d:67c5:0:b0:333:3867:c5aa with SMTP id n5-20020a5d67c5000000b003333867c5aamr1932506wrw.20.1701961614467; Thu, 07 Dec 2023 07:06:54 -0800 (PST) Received: from alex-rivos.home (amontpellier-656-1-456-62.w92-145.abo.wanadoo.fr. [92.145.124.62]) by smtp.gmail.com with ESMTPSA id a8-20020adffb88000000b003335e67e574sm1649359wrr.78.2023.12.07.07.06.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 07:06:54 -0800 (PST) From: Alexandre Ghiti To: Catalin Marinas , Will Deacon , Thomas Bogendoerfer , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Paul Walmsley , Palmer Dabbelt , Albert Ou , Andrew Morton , Ved Shanbhogue , Matt Evans , Dylan Jhong , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-mm@kvack.org Cc: Alexandre Ghiti Subject: [PATCH RFC/RFT 3/4] riscv: Stop emitting preventive sfence.vma for new userspace mappings Date: Thu, 7 Dec 2023 16:03:47 +0100 Message-Id: <20231207150348.82096-4-alexghiti@rivosinc.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231207150348.82096-1-alexghiti@rivosinc.com> References: <20231207150348.82096-1-alexghiti@rivosinc.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 2387A140022 X-Rspam-User: X-Stat-Signature: ubk9d9q76uad4jrh8iqh59n76aom4kma X-Rspamd-Server: rspam03 X-HE-Tag: 1701961615-558375 X-HE-Meta: U2FsdGVkX183DUoZoOxkrwGaE0qOEaKfAdqYvnsTaU9iVDrkX+ocfID9203Vn9r0XeZpIFuj9j9ea27OXoWiSM6Xy3JEHJdxpgpjqPB4mbOw/S2TkZLzG3toXiGMZ+g5+qQdMVZNyUFNdwjVsUopjzjOdtRMCoiWpJ0embYBVbnNyok1kc4nbPhrYixXy0BKiMwR7rMYcSereRHoh9xYY0YQUJ9F43ij3RDOPEXLlhyEUTN63I7NUNmyP9nh/6mR0n+rqfd/TTatooKOzp5U8D9EoCb23bMXmQaEYOQ5JToqoiJoW3xkcEsHdqGNbRjcVjDM/S2l/vd7Ud10vTe2yVQ4cksThwN1WWboENUheXyHjBQ1tF02vAQB6cjlPvXmvhtYm1RMvtJ6kfJydLGsw0xes1SgOREX8Rg9fapTOfPNlswKV7nGM67QAeshSjddHftyf4U0hWK2ePvU0Rq46oPbZmmrANpfR2hPgJevf/d5Ib8+3VlzaIoRE/+YuuDz/s9I0fpnJS4FZL2LqkvTiQ3PXp1t1A9Sln8MxOmbxhOoss8h/6bP0sAkLtSTvR4M2cLj6QnC174d5vJ/41i3ik0wEHDl1NgXaZzwKYdewOOlcZpa9mko5lP0i26Rr7o8SR/02DHcf7cAeUKLoJr/UhOJb5aR861TYru3Vd5W+ByDDiaCM4+mJmt3UdYI1G3NghJAT5WtWYl6KAMZGrR0xbM8G2jdeTyt9mzNdUYw0Hbfvgl4UiWFGpt2xCdLTkY31vwHZF3pTZDI3eXrW++OvlzP3t1H7qzdOjOcS0qNUPj6sXeBmNwUSahO+NN1QosGCJawrokh2mRrAnsqeDYBG9S8rNtItI4J1Nu6s8NsKKEyS/UU9EN+RTQvyxOLBLZ5ddTx4HbyWY2kLrUSab5zKiNPVW6zDVoCHtWwRJwsYSjdVis+4aU+whe64WNuCaSQVQ+sFW0wMMZwjJMjFNy BY/Xj9Vb XEwevsF72RIA33wnVrUHwwRU68p1V2XgltRiB/Hg8ma2yQxramATm2pFn2jwCqBWeVAOPajfMwFbSM6c94ixfmmsyJJpuFntyv6x1R06aTg3BQDSbGfkeLGG1cBULbCtgp6+OIcI05165dnFxtjdSs+9OwmsIFZMnrd3JOePTBRldpCSC0lmrqtu+FFMRwXSgyPz9oB2+5UBB7Gawcpju/ukMUWhNWjKp1FGuNbhZLWPvie6c/IfcJzSDMoz/tBmW8akpffn6iLbAYl9frklXQQxxGUdAU+IouXj9E24lmYcfLDrFrj7Pz432eqa+zS/Xa/75c8d93eURbNJ2+MxAIoZIPYlcfyHXyOyzkSASpCvZX3dBaAskm4RJEbkDnNYwLjgWq4GDSuVUKa0FVgxK+zaUkdtKaPvmubbGXSzY7eDw9cVaETAEpgZI8r4QT/SmhA8IDt56necRSbB0NFy90pB4gYftTVebKJcuF87JyCKk4R4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The preventive sfence.vma were emitted because new mappings must be made visible to the page table walker, either the uarch caches invalid entries or not. Actually, there is no need to preventively sfence.vma on new mappings for userspace, this should be handled only in the page fault path. This allows to drastically reduce the number of sfence.vma emitted: * Ubuntu boot to login: Before: ~630k sfence.vma After: ~200k sfence.vma * ltp - mmapstress01 Before: ~45k After: ~6.3k * lmbench - lat_pagefault Before: ~665k After: 832 (!) * lmbench - lat_mmap Before: ~546k After: 718 (!) The only issue with the removal of sfence.vma in update_mmu_cache() is that on uarchs that cache invalid entries, those won't be invalidated until the process takes a fault: so that's an additional fault in those cases. Signed-off-by: Alexandre Ghiti --- arch/arm64/include/asm/pgtable.h | 2 +- arch/mips/include/asm/pgtable.h | 6 +-- arch/powerpc/include/asm/book3s/64/tlbflush.h | 8 ++-- arch/riscv/include/asm/pgtable.h | 43 +++++++++++-------- include/linux/pgtable.h | 8 +++- mm/memory.c | 12 +++++- 6 files changed, 48 insertions(+), 31 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 7f7d9b1df4e5..728f25f529a5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -57,7 +57,7 @@ static inline bool arch_thp_swp_supported(void) * fault on one CPU which has been handled concurrently by another CPU * does not need to perform additional invalidation. */ -#define flush_tlb_fix_spurious_fault(vma, address, ptep) do { } while (0) +#define flush_tlb_fix_spurious_write_fault(vma, address, ptep) do { } while (0) /* * ZERO_PAGE is a global shared page that is always zero: used diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h index 430b208c0130..84439fe6ed29 100644 --- a/arch/mips/include/asm/pgtable.h +++ b/arch/mips/include/asm/pgtable.h @@ -478,9 +478,9 @@ static inline pgprot_t pgprot_writecombine(pgprot_t _prot) return __pgprot(prot); } -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma, - unsigned long address, - pte_t *ptep) +static inline void flush_tlb_fix_spurious_write_fault(struct vm_area_struct *vma, + unsigned long address, + pte_t *ptep) { } diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h index 1950c1b825b4..7166d56f90db 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h @@ -128,10 +128,10 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, #define flush_tlb_page(vma, addr) local_flush_tlb_page(vma, addr) #endif /* CONFIG_SMP */ -#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault -static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma, - unsigned long address, - pte_t *ptep) +#define flush_tlb_fix_spurious_write_fault flush_tlb_fix_spurious_write_fault +static inline void flush_tlb_fix_spurious_write_fault(struct vm_area_struct *vma, + unsigned long address, + pte_t *ptep) { /* * Book3S 64 does not require spurious fault flushes because the PTE diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index b2ba3f79cfe9..89aa5650f104 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -472,28 +472,20 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma, unsigned long address, pte_t *ptep, unsigned int nr) { - /* - * The kernel assumes that TLBs don't cache invalid entries, but - * in RISC-V, SFENCE.VMA specifies an ordering constraint, not a - * cache flush; it is necessary even after writing invalid entries. - * Relying on flush_tlb_fix_spurious_fault would suffice, but - * the extra traps reduce performance. So, eagerly SFENCE.VMA. - */ - while (nr--) - local_flush_tlb_page(address + nr * PAGE_SIZE); } #define update_mmu_cache(vma, addr, ptep) \ update_mmu_cache_range(NULL, vma, addr, ptep, 1) #define __HAVE_ARCH_UPDATE_MMU_TLB -#define update_mmu_tlb update_mmu_cache +static inline void update_mmu_tlb(struct vm_area_struct *vma, + unsigned long address, pte_t *ptep) +{ + flush_tlb_range(vma, address, address + PAGE_SIZE); +} static inline void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { - pte_t *ptep = (pte_t *)pmdp; - - update_mmu_cache(vma, address, ptep); } #define __HAVE_ARCH_PTE_SAME @@ -548,13 +540,26 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t entry, int dirty) { - if (!pte_same(*ptep, entry)) + if (!pte_same(*ptep, entry)) { __set_pte_at(ptep, entry); - /* - * update_mmu_cache will unconditionally execute, handling both - * the case that the PTE changed and the spurious fault case. - */ - return true; + /* Here only not svadu is impacted */ + flush_tlb_page(vma, address); + return true; + } + + return false; +} + +extern u64 nr_sfence_vma_handle_exception; +extern bool tlb_caching_invalid_entries; + +#define flush_tlb_fix_spurious_read_fault flush_tlb_fix_spurious_read_fault +static inline void flush_tlb_fix_spurious_read_fault(struct vm_area_struct *vma, + unsigned long address, + pte_t *ptep) +{ + if (tlb_caching_invalid_entries) + flush_tlb_page(vma, address); } #define __HAVE_ARCH_PTEP_GET_AND_CLEAR diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index af7639c3b0a3..7abaf42ef612 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -931,8 +931,12 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) # define pte_accessible(mm, pte) ((void)(pte), 1) #endif -#ifndef flush_tlb_fix_spurious_fault -#define flush_tlb_fix_spurious_fault(vma, address, ptep) flush_tlb_page(vma, address) +#ifndef flush_tlb_fix_spurious_write_fault +#define flush_tlb_fix_spurious_write_fault(vma, address, ptep) flush_tlb_page(vma, address) +#endif + +#ifndef flush_tlb_fix_spurious_read_fault +#define flush_tlb_fix_spurious_read_fault(vma, address, ptep) #endif /* diff --git a/mm/memory.c b/mm/memory.c index 517221f01303..5cb0ccf0c03f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5014,8 +5014,16 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) * with threads. */ if (vmf->flags & FAULT_FLAG_WRITE) - flush_tlb_fix_spurious_fault(vmf->vma, vmf->address, - vmf->pte); + flush_tlb_fix_spurious_write_fault(vmf->vma, vmf->address, + vmf->pte); + else + /* + * With the pte_same(ptep_get(vmf->pte), entry) check + * that calls update_mmu_tlb() above, multiple threads + * faulting at the same time won't get there. + */ + flush_tlb_fix_spurious_read_fault(vmf->vma, vmf->address, + vmf->pte); } unlock: pte_unmap_unlock(vmf->pte, vmf->ptl);