From patchwork Fri May 26 23:44:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13257438 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EB55C77B7C for ; Fri, 26 May 2023 23:44:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 565F6900002; Fri, 26 May 2023 19:44:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 515BB6B0072; Fri, 26 May 2023 19:44:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DD89900002; Fri, 26 May 2023 19:44:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2E8DE6B0071 for ; Fri, 26 May 2023 19:44:43 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8440F40FBD for ; Fri, 26 May 2023 23:44:42 +0000 (UTC) X-FDA: 80834038404.26.419E78D Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf19.hostedemail.com (Postfix) with ESMTP id CA3E01A0009 for ; Fri, 26 May 2023 23:44:40 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=gaII7xm1; spf=pass (imf19.hostedemail.com: domain of 3Z0RxZAYKCN4YUZHAOGOOGLE.COMLINUX-MMKVACK.ORG@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3Z0RxZAYKCN4YUZHAOGOOGLE.COMLINUX-MMKVACK.ORG@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685144680; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=p2/B2gpADUVSH4+1+wdvhwwOJ3ZLS46dsub4yrmfIrY=; b=HHkIBbSgXKrT30wnl80ySsuTwK7RjndL+mnoxY0UEGgr+gryqIuQAVTCmaRHGbCPparO/3 kon4f1iuXG/8JFcwHUMEGinnHggPbLekDPumvs+QeElXDItFECcu4Xn1bKb1O2eDvmgWzS lUBuu7N7faA5Ls3fJrm3I6SW/HoKWnY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685144680; a=rsa-sha256; cv=none; b=bo4pf080BZJAmJDcs+2zhtAY1h/b8AnJXW36+NCN+YAldiPr16Zf4V5tPf3S+qvW49LCHW PlQi/TvrFOkk9bIN4fefj9pbB03yUvbhSN19oWWfwnu186uHDsVbfjMO9FbUbDI2Sk8NvF 7eSWWO49yavooUaF7iLaVJitAK+Bqms= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=gaII7xm1; spf=pass (imf19.hostedemail.com: domain of 3Z0RxZAYKCN4YUZHAOGOOGLE.COMLINUX-MMKVACK.ORG@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3Z0RxZAYKCN4YUZHAOGOOGLE.COMLINUX-MMKVACK.ORG@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-ba8c3186735so2747341276.3 for ; Fri, 26 May 2023 16:44:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1685144680; x=1687736680; h=cc:to:from:subject:mime-version:message-id:date:from:to:cc:subject :date:message-id:reply-to; bh=p2/B2gpADUVSH4+1+wdvhwwOJ3ZLS46dsub4yrmfIrY=; b=gaII7xm1rKF0WflE0kwbmo9D1wFCFrrD6VH1zQ9TXwAabssHzzxcZKDpxFjf1Ep6KT Z+Qr7BXS8s0v9zHiHqeN2dE3WQB88ARaW8AH1iSDJCS3cr45iCn1L9lVI9ZPcsS480+k wbmFDYCaUAqowUGhuKg1M8XmNsmQHtwbdRXfkKNKftJJVvjlpX0Njj/C3Keo2/5Fq3Ip rJb1BWaFB48m4r9hq6xevdAWxHKsOWbdYQ3JRHwN28GN7dhocxPWoxBC2UOWVDxNAvVb KELlyeUjKRt7k8dTr6Xo/3BcetEN0kRAapuiQFxtxhRi3ZR8WdPaedyJyCpfkgSiKRr3 Vpww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144680; x=1687736680; h=cc:to:from:subject:mime-version:message-id:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=p2/B2gpADUVSH4+1+wdvhwwOJ3ZLS46dsub4yrmfIrY=; b=bAr2Oj622+aiX4LCE2PD04z/qUjVFXwfiQb0nGHWJ4AjxxNG02M4pIiaJDxO0zHylF KqD9RgSqjfxxuh07IIQ6/JVPwDb34nCU/ZipawF7iuCKVKfZIoXlW+WoMgpDEHfb33br 2B4kKxYtCAT7EhWiVsz8f+RgSgAellzhLsEOlfbyLpwnIJit6/h3wQ+KsFLW35Y47p8R bNlkmpHCDOb83MDn34XBBIVfm4Kn8wVleDsB06KWFAi87hN0p10sVU0kYj1S7hqNjvAg ylS7fD1lrS6QM92uflFN9wTRGWRyIqJQll6HWz8ZlSPhxgNWPoOZ5BXC2UeCWq1p/0kp 3w7A== X-Gm-Message-State: AC+VfDwGRrLDKKDwgOo8N1kFXBnTGbnsvrZKghlIE9dbdwJdfnh4h/YH GB/rVHDGNBehRUQ5mis94TfYotft5RU= X-Google-Smtp-Source: ACHHUZ7YDXs22loXI9Hn4p7O2u7mz3EIOuiyK6CgImdELnj21szlSQpHCqRMIQPES8e6xN/3760vCEBuevA= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:910f:8a15:592b:2087]) (user=yuzhao job=sendgmr) by 2002:a25:7343:0:b0:bad:99d:f086 with SMTP id o64-20020a257343000000b00bad099df086mr1339084ybc.10.1685144679852; Fri, 26 May 2023 16:44:39 -0700 (PDT) Date: Fri, 26 May 2023 17:44:25 -0600 Message-Id: <20230526234435.662652-1-yuzhao@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Subject: [PATCH mm-unstable v2 00/10] mm/kvm: locklessly clear the accessed bit From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Alistair Popple , Anup Patel , Ben Gardon , Borislav Petkov , Catalin Marinas , Chao Peng , Christophe Leroy , Dave Hansen , Fabiano Rosas , Gaosheng Cui , Gavin Shan , "H. Peter Anvin" , Ingo Molnar , James Morse , "Jason A. Donenfeld" , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Masami Hiramatsu , Michael Ellerman , Michael Larabel , Mike Rapoport , Nicholas Piggin , Oliver Upton , Paul Mackerras , Peter Xu , Sean Christopherson , Steven Rostedt , Suzuki K Poulose , Thomas Gleixner , Thomas Huth , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-Rspamd-Queue-Id: CA3E01A0009 X-Rspam-User: X-Stat-Signature: ukqt8wn8yy4ng8r9ufhb6dh5m6usbsjy X-Rspamd-Server: rspam03 X-HE-Tag: 1685144680-219991 X-HE-Meta: U2FsdGVkX18tlqco1b9iU1fqKxaSaI8BnrX54h63g8TL2Og5tg/ya0V+4OLaRdK2VmwGHfZgPC/7xcJLkEjHIjGbfXCVUroBqfeS+ewJDG5T0F5cwETngU8j2QvTHYYAE+LNRRnIoMvKs1JYjqT0ucCOA4ApJKLh+dwFsOb6sjfYE2euYbOGsQ1Y4LVkJVabUi2NdBW7xutcASIQ1M8HrOyMC5NEti2DOZ3h8tRa6mYEry2oOeHfWuPQtVKq2ORV65VJ9TIxGADSfbZ+pVTr/9uvvV7KRx8n9Ivr2y95jS2rf7SmQTVfu2AuP9hRMfcDtBQhBsa7NDwy/aFu7pbAUtiQE4fiE4h+atB1mzoJedEKpvdVL90e6oHCYibqtDJA8iTYBcAEtH+hggCkfEp37TAyW+kkgN6i4yNqXyXOj6vCg4v6hyXQucQ+fexdkpLdDfJLEskd8sQgO+dug9jorwFEBaE0ZZAgOIJ72z3JWzSfiJmmmuzs4lhDe6CVIxn4CNZLKQO2NlGH7lBQNPujKMMdnktiv4/c4PtEUdSHPqarFO1YM9oGKMPbWhvtXQX81VG97ogqDueSUdokta5BG3tb25I3ZXYTr6wMraIbfINQQZXLXLnxl08NkgrkJ/sVsQFhy9BKuZTyepNV2S5mjcGsX7iJ2jEZbr/brFGHneAEtyNM4t/Q7q7QgLsYnEg0Gs55vWHCxPsGTt+gQbn06JSE7hrPyIn/7WwTua9Oaiot3ECLHBcMNK3rB8GN3UtRIY+hyqXdquHSKUr8n7E5fjO9FpIZa48fcF2wIRF1Xlu7gANU7fx06yajzfZt7ads7bTl1cI8nKuKrWqR1g9irK3Ns2Y7R3i83pudmrABm/4+/oo5OBSQa4/EPa6CaqdEQrCQXQNlM4pLaXdIpNEBdVIBIDzsiTpqiBMamuGJ/w0ICUEat/brU5xZ/DVmKZy3NU6HcigebwmG3nRSsWU LQMrdZpn xz/w2ymCaeMKJ5AF/VgA0hrICecTtUDC552yWkjQwWWAOgzmd2Xu61CJL1Q1dXHRk3VniQMpeN7I3NF6tAYLuPhY0ysRE8aE7rY3Zksdw0nnNpYxLKoPRFKR8BGGmxek73J5bOplt8aPfKRtF7W4iy4vVF414+KodFGgUKPc3aqM4jiskBnYYD1m55GQwjTlgEW5xcB19ca6zRPxiItsqn7uvh73z54gSu2vFXIUjq3XxCQVOYtP6iPHpoY7/pzw/tUJtQXcjxlEA7olVi93RKhdNwHrfIVAhLfESqHNHfrTCAcnJijvNbS+C7D1AUtUClD478Fr+RJTjDqz93OGERt2p9uTYyMv6vmc06yQMl316sjz8VBlNi+OZMn1eHj7NHOBUeDiLPZNzhLLx/nIcqwW8JKuY2o7DbxAqksYabCdV7Bs9YMbpflFv9gdxHk6b4sSZ+aT6l5NX73ItSsyh82M6Yf8w6BnM5pPPlGkjsUk6x/gv365RgWTavtVXQc71k44H5QOxQwyMbJA+JRfZCpFnLaggjt4cpAYGkQz4EsbauKCs/zzJG0i6P1vfSs6Qja1Asq3/rQiDlpD2elcl2CLclFe/f4HwJIsgVFuZyydgvwDZize9ldjSMzbZ5hkzJIWpMMM5QeHii7ZtNnjZ0N4Nxd4ZDEnARZzB39tRvh4QpSQQtOuOJKkkvvMdZnPV1vNPAUO6FyO/FJN1VfE1MiWClOXB/YdOowXF+3QAzNQimL2KjTyQqPhDOw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: TLDR ==== This patchset adds a fast path to clear the accessed bit without taking kvm->mmu_lock. It can significantly improve the performance of guests when the host is under heavy memory pressure. ChromeOS has been using a similar approach [1] since mid 2021 and it was proven successful on tens of millions devices. This v2 addressed previous requests [2] on refactoring code, removing inaccurate/redundant texts, etc. [1] https://crrev.com/c/2987928 [2] https://lore.kernel.org/r/20230217041230.2417228-1-yuzhao@google.com/ Overview ======== The goal of this patchset is to optimize the performance of guests when the host memory is overcommitted. It focuses on a simple yet common case where hardware sets the accessed bit in KVM PTEs and VMs are not nested. Complex cases fall back to the existing slow path where kvm->mmu_lock is then taken. The fast path relies on two techniques to safely clear the accessed bit: RCU and CAS. The former protects KVM page tables from being freed while the latter clears the accessed bit atomically against both the hardware and other software page table walkers. A new mmu_notifier_ops member, test_clear_young(), supersedes the existing clear_young() and test_young(). This extended callback can operate on a range of KVM PTEs individually according to a bitmap, if the caller provides it. Evaluation ========== An existing selftest can quickly demonstrate the effectiveness of this patchset. On a generic workstation equipped with 128 CPUs and 256GB DRAM: $ sudo max_guest_memory_test -c 64 -m 250 -s 250 MGLRU run2 ------------------ Before [1] ~64s After ~51s kswapd (MGLRU before) 100.00% balance_pgdat 100.00% shrink_node 100.00% shrink_one 99.99% try_to_shrink_lruvec 99.71% evict_folios 97.29% shrink_folio_list ==>> 13.05% folio_referenced 12.83% rmap_walk_file 12.31% folio_referenced_one 7.90% __mmu_notifier_clear_young 7.72% kvm_mmu_notifier_clear_young 7.34% _raw_write_lock kswapd (MGLRU after) 100.00% balance_pgdat 100.00% shrink_node 100.00% shrink_one 99.99% try_to_shrink_lruvec 99.59% evict_folios 80.37% shrink_folio_list ==>> 3.74% folio_referenced 3.59% rmap_walk_file 3.19% folio_referenced_one 2.53% lru_gen_look_around 1.06% __mmu_notifier_test_clear_young Comprehensive benchmarks are coming soon. [1] "mm: rmap: Don't flush TLB after checking PTE young for page reference" was included so that the comparison is apples to apples. https://lore.kernel.org/r/20220706112041.3831-1-21cnbao@gmail.com/ Yu Zhao (10): mm/kvm: add mmu_notifier_ops->test_clear_young() mm/kvm: use mmu_notifier_ops->test_clear_young() kvm/arm64: export stage2_try_set_pte() and macros kvm/arm64: make stage2 page tables RCU safe kvm/arm64: add kvm_arch_test_clear_young() kvm/powerpc: make radix page tables RCU safe kvm/powerpc: add kvm_arch_test_clear_young() kvm/x86: move tdp_mmu_enabled and shadow_accessed_mask kvm/x86: add kvm_arch_test_clear_young() mm: multi-gen LRU: use mmu_notifier_test_clear_young() Documentation/admin-guide/mm/multigen_lru.rst | 6 +- arch/arm64/include/asm/kvm_host.h | 6 + arch/arm64/include/asm/kvm_pgtable.h | 55 +++++++ arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/hyp/pgtable.c | 61 +------- arch/arm64/kvm/mmu.c | 53 ++++++- arch/powerpc/include/asm/kvm_host.h | 8 + arch/powerpc/include/asm/kvm_ppc.h | 1 + arch/powerpc/kvm/book3s.c | 6 + arch/powerpc/kvm/book3s.h | 1 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 65 +++++++- arch/powerpc/kvm/book3s_hv.c | 5 + arch/x86/include/asm/kvm_host.h | 13 ++ arch/x86/kvm/mmu.h | 6 - arch/x86/kvm/mmu/spte.h | 1 - arch/x86/kvm/mmu/tdp_mmu.c | 34 +++++ include/linux/kvm_host.h | 22 +++ include/linux/mmu_notifier.h | 79 ++++++---- include/linux/mmzone.h | 6 +- include/trace/events/kvm.h | 15 -- mm/mmu_notifier.c | 48 ++---- mm/rmap.c | 8 +- mm/vmscan.c | 139 ++++++++++++++++-- virt/kvm/kvm_main.c | 114 ++++++++------ 24 files changed, 546 insertions(+), 207 deletions(-)