From patchwork Fri Apr 1 18:08:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12798580 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFF75C433F5 for ; Fri, 1 Apr 2022 18:08:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 487836B0071; Fri, 1 Apr 2022 14:07:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4365D6B0072; Fri, 1 Apr 2022 14:07:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2D8FD8D0001; Fri, 1 Apr 2022 14:07:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0174.hostedemail.com [216.40.44.174]) by kanga.kvack.org (Postfix) with ESMTP id 1E3C56B0071 for ; Fri, 1 Apr 2022 14:07:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D12AC18352D3F for ; Fri, 1 Apr 2022 18:07:44 +0000 (UTC) X-FDA: 79309093248.28.AC25B29 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf02.hostedemail.com (Postfix) with ESMTP id 4EB9A80019 for ; Fri, 1 Apr 2022 18:07:44 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id j13so3081597plj.8 for ; Fri, 01 Apr 2022 11:07:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=YJ6KvOYKANww9ss0O7B0RH4c0DcVkA2G6fJSFpeCF48=; b=AZqEzrT/GUHhqCEDBE8J2JKnKtOZAIZ1AGzt5CD3D9+yfMfhpskNwQ6UnspshcbSOw fFSTMIGc/SMbcqVZKBKcKdujyNMH251bCPFJm8ehvm8OF/f1Iozp9JxPKutsu1kd4l6s Yy6iIkgmTq7uVDC2fcpNH0gTBWsqUyo/b4xPgchfjFYNT+3noxdwwipRMAXyW65D52yd GEHoU4v6nfMtGM/j7Lj9unPI41pQrALjM6h3aXPvOvkWZGSy4CA37v/q4wlzeubc877C Bw9aXcMfObKJ5mSU3UntHVR5M4TA811T7HP6YRcNOd6KM5O+HyawnU2cyzQ7K6X7f4PZ lnKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=YJ6KvOYKANww9ss0O7B0RH4c0DcVkA2G6fJSFpeCF48=; b=Gpba+RRFRAkBMDoKJWPHMsAW2FxJUCeyDFb47OiG9Osv//t4jUXLgVyF6Oe/oRgv7x qXuq8q0FXrMy9nvqh9LbCa3VXBECNQZXTMJRpM9rLhJCqcaTfT3LpVwylSqV+TyBS5I+ S1x0m1J4ubyzycMNI7jxvQdhitBYNjdgLzmJfnGX6I7TLutoiJDZw+PQwCHP54Rid0Wq FD2HJ/tQc2oMntUpPGun7rJW3CqQoBjUlPdjLRzUP2al57+SFIiI0qzWd5I73ObA3CNh XFDuvx+6KH0aiMeEjajMrSEv7OU7OIqC2GWuVs1Zr/GnaI7NS5PeEUWQlzXeJTBEBpXo PS/w== X-Gm-Message-State: AOAM530nTDa5LyOv0f9PjpYaNaN3XzvsgH90LkaYwQXS2OwnIxp4592g qjMLEhIU757L4HXxSUEUolQ= X-Google-Smtp-Source: ABdhPJy5vQxboVHh5SjFQngIf2W9mGLP3MiKgZYYoiswlzNozM0vEj2I5bf1XeVtFfTib2ThMTvdDA== X-Received: by 2002:a17:90b:352:b0:1c6:77e:a4f7 with SMTP id fh18-20020a17090b035200b001c6077ea4f7mr13162792pjb.77.1648836463070; Fri, 01 Apr 2022 11:07:43 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id b13-20020a056a00114d00b004c122b90703sm3783082pfm.27.2022.04.01.11.07.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Apr 2022 11:07:42 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: Andrew Morton Cc: linux-mm@kvack.org, David Hildenbrand , Nadav Amit , Andrea Arcangeli , Andrew Cooper , Andy Lutomirski , Dave Hansen , Peter Xu , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , x86@kernel.org Subject: [PATCH v6 0/3] mm/mprotect: avoid unnecessary TLB flushes Date: Fri, 1 Apr 2022 11:08:18 -0700 Message-Id: <20220401180821.1986781-1-namit@vmware.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Rspam-User: Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="AZqEzrT/"; spf=none (imf02.hostedemail.com: domain of mail-pl1-f177.google.com has no SPF policy when checking 209.85.214.177) smtp.helo=mail-pl1-f177.google.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 4EB9A80019 X-Stat-Signature: 9odf5nxdidgdzmoi38fwufj5j8g57fxo X-HE-Tag: 1648836464-67151 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit This patch-set is intended to remove unnecessary TLB flushes during mprotect() syscalls. Once this patch-set make it through, similar and further optimizations for MADV_COLD and userfaultfd would be possible. Basically, there are 3 optimizations in this patch-set: 1. Use TLB batching infrastructure to batch flushes across VMAs and do better/fewer flushes. This would also be handy for later userfaultfd enhancements. 2. Avoid unnecessary TLB flushes. This optimization is the one that provides most of the performance benefits. Unlike previous versions, we now only avoid flushes that would not result in spurious page-faults. 3. Avoiding TLB flushes on change_huge_pmd() that are only needed to prevent the A/D bits from changing. Andrew asked for some benchmark numbers. I do not have an easy determinate macrobenchmark in which it is easy to show benefit. I therre ran a microbenchmark: a loop that does the following on anonymous memory, just as a sanity check to see that time is saved by avoiding TLB flushes. The loop goes: mprotect(p, PAGE_SIZE, PROT_READ) mprotect(p, PAGE_SIZE, PROT_READ|PROT_WRITE) *p = 0; // make the page writable The test was run in KVM guest with 1 or 2 threads (the second thread was busy-looping). I measured the time (cycles) of each operation: 1 thread 2 threads mmots +patch mmots +patch PROT_READ 3494 2725 (-22%) 8630 7788 (-10%) PROT_READ|WRITE 3952 2724 (-31%) 9075 2865 (-68%) [ mmots = v5.17-rc6-mmots-2022-03-06-20-38 ] The exact numbers are really meaningless, but the benefit is clear. There are 2 interesting results though. (1) PROT_READ is cheaper, while one can expect it not to be affected. This is presumably due to TLB miss that is saved (2) Without memory access (*p = 0), the speedup of the patch is even greater. In that scenario mprotect(PROT_READ) also avoids the TLB flush. As a result both operations on the patched kernel take roughly ~1500 cycles (with either 1 or 2 threads), whereas on mmotm their cost is as high as presented in the table. Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andrew Morton Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Cc: x86@kernel.org --- v5 -> v6: * Wrong patch 2 was sent on v5 v4 -> v5: * Avoid only TLB flushes that would not result in spurious PF [Dave] * Better comments, names in pte_flags_need_flush() [Dave] v3 -> v4: * Remove KNL-related stuff [Dave] * Check error code sanity on every PF [Dave] * Reduce nesting, simplify access_error() changes [Dave] * Remove redundant present->non-present check * Use break instead of goto in do_mprotect_pkey() * Add missing change_prot_numa() chunk v2 -> v3: * Fix orders of patches (order could lead to breakage) * Better comments * Clearer KNL detection [Dave] * Assertion on PF error-code [Dave] * Comments, code, function names improvements [PeterZ] * Flush on access-bit clearing on PMD changes to follow the way flushing on x86 is done today in the kernel. v1 -> v2: * Wrong detection of permission demotion [Andrea] * Better comments [Andrea] * Handle THP [Andrea] * Batching across VMAs [Peter Xu] * Avoid open-coding PTE analysis * Fix wrong use of the mmu_gather() Nadav Amit (3): mm/mprotect: use mmu_gather mm/mprotect: do not flush when not required architecturally mm: avoid unnecessary flush on change_huge_pmd() arch/x86/include/asm/pgtable.h | 5 ++ arch/x86/include/asm/pgtable_types.h | 2 + arch/x86/include/asm/tlbflush.h | 121 +++++++++++++++++++++++++++ arch/x86/mm/pgtable.c | 10 +++ fs/exec.c | 6 +- include/asm-generic/tlb.h | 14 ++++ include/linux/huge_mm.h | 5 +- include/linux/mm.h | 5 +- include/linux/pgtable.h | 20 +++++ mm/huge_memory.c | 19 +++-- mm/mempolicy.c | 9 +- mm/mprotect.c | 93 ++++++++++---------- mm/pgtable-generic.c | 8 ++ mm/userfaultfd.c | 6 +- 14 files changed, 268 insertions(+), 55 deletions(-)