From patchwork Fri Feb 17 04:12:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13144312 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87791C64EC4 for ; Fri, 17 Feb 2023 04:12:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEA996B0073; Thu, 16 Feb 2023 23:12:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DFE836B0074; Thu, 16 Feb 2023 23:12:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C515E6B0075; Thu, 16 Feb 2023 23:12:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B60B06B0073 for ; Thu, 16 Feb 2023 23:12:39 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7F1631A1383 for ; Fri, 17 Feb 2023 04:12:39 +0000 (UTC) X-FDA: 80475462438.09.C777DFF Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf25.hostedemail.com (Postfix) with ESMTP id C1979A0002 for ; Fri, 17 Feb 2023 04:12:37 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dOLMIGuU; spf=pass (imf25.hostedemail.com: domain of 3tP7uYwYKCJMLHM4xB3BB381.zB985AHK-997Ixz7.BE3@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3tP7uYwYKCJMLHM4xB3BB381.zB985AHK-997Ixz7.BE3@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676607157; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yzogm92QmXfssxHKM4jCt3C6vEqgvX7DQk09DtYoUyc=; b=GZAWnLSrieQfeGG8qJjpzbbRTcgnd40gCcTbzypxZHAzS4DxaOcrI125AxTftcx/QV20sJ 2tQdNoLKTmrOpSKOKkHbEinJP8ZCM+2bmwMGfHGGuguqBcEmp4LEECz/Gd24bwZ0/0r8Lh d1Ic0oF9aVr2E7+zDEYRQtyZWwhAyVc= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dOLMIGuU; spf=pass (imf25.hostedemail.com: domain of 3tP7uYwYKCJMLHM4xB3BB381.zB985AHK-997Ixz7.BE3@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3tP7uYwYKCJMLHM4xB3BB381.zB985AHK-997Ixz7.BE3@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676607157; a=rsa-sha256; cv=none; b=MQjnY14rO6rZO5b1nhw6piZ1ArfLw5pQpgtk3NKrPnOjSf8FRmTNrrpWoqlPi/rGyC2okI HpbJ0yRE8kYfMGGD6pmv7lXl8sKPxD15aCpNfs5rPkmTWOv0HvihWJ6A+Ji4vIcTjoA3Tk wrFAqhic0qMTCaBBvEcF+macIgrIbQo= Received: by mail-yb1-f202.google.com with SMTP id cf37-20020a056902182500b009802c10698cso545556ybb.22 for ; Thu, 16 Feb 2023 20:12:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Yzogm92QmXfssxHKM4jCt3C6vEqgvX7DQk09DtYoUyc=; b=dOLMIGuUslf4WaXhpFtEYe+ZvfMFSXx31VRArlOz/rNZk3zy5YMmk3nCIzYxrGsNnQ 7eRf4QdsdODHHdw/wlrVm7kSLLs+E22VAIm7EsCIR92rCUG+EUcYdYgiYbkJnMtJzpTg rS6yL1Y8cSJGD2id5jKC2qJUALM2w7GybvtYoZ9p5lB6n892fxqLa/Oi84Gd2BIaAmzM 1MuoCcs3gdI2qDkS3MSaD05GmwOz5MykmjGtOnBaaMGNkkyIGgyiVAHPQw0yBx6R0RPE jtZGgtOi3SyqmladgbWsAbqw7tEMsjumncO64YH26dlS89ZgYt0+Mo5zUNUWGuZxIW4p 5FZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Yzogm92QmXfssxHKM4jCt3C6vEqgvX7DQk09DtYoUyc=; b=u104jAv3y8aedRJSuHYWIroJFOeeqo2JR9IuyZ+ZzHFEJCfXIoR1IfnhPexDMY4vBT gLh52HkAW+oxK8fGTYXQDpkJ/ub7ZHbLOJs1ImLmC7YjJgJ6ddKM3KUK8iTdl+e9W4Zf xtNxjSQyfZ+ESHs2Ksia+rJ0ohwUtsCKxQ4y0G6lx2c3hc2J0VjSRnme9OmmF4wUc5H1 gxW0joi6NV7SGzMWWg2MM8kT8ljGm8UpgMk84h92/LswYI8uMGTf7EtK/eRsWsU7pf9t gPPftKkcaIHHGW/gKGD9WZuiPIJQhp/ax4o4OUhSTMXtBZAVrW+u8K+di95R9jkO4mnR nqHQ== X-Gm-Message-State: AO0yUKUUJ5alKVJHOeKgXlhK94dGaym/RgDvlfCoX4Gj/oZ6XrZCS6Fg zRGckWIoXnGxGMCDfXCFta21fq8Xqh0= X-Google-Smtp-Source: AK7set+ctTnPBlk//+grImBJ58HiDlxzr1n1x/EYTRUxI9MgHdyKhlzqFYbYy1jt34qlgVB0Lo9nfk8TooA= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:6fb3:61e:d31f:1ad3]) (user=yuzhao job=sendgmr) by 2002:a81:3e05:0:b0:536:4d58:54b2 with SMTP id l5-20020a813e05000000b005364d5854b2mr149ywa.4.1676607156974; Thu, 16 Feb 2023 20:12:36 -0800 (PST) Date: Thu, 16 Feb 2023 21:12:26 -0700 In-Reply-To: <20230217041230.2417228-1-yuzhao@google.com> Message-Id: <20230217041230.2417228-2-yuzhao@google.com> Mime-Version: 1.0 References: <20230217041230.2417228-1-yuzhao@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Subject: [PATCH mm-unstable v1 1/5] mm/kvm: add mmu_notifier_test_clear_young() From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Jonathan Corbet , Michael Larabel , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C1979A0002 X-Stat-Signature: s6x7ufgkxcs8i47wo1rjy7kps7c98iz5 X-Rspam-User: X-HE-Tag: 1676607157-169289 X-HE-Meta: U2FsdGVkX1/4rQQB6AgmjVeA/sAA3pC/D+bReyaTaOIgohRBucuR1DCdFCIlS0n15VbNuerBmYYFfrpjA+ChPiESe0bi6h9OYoxZVdfkNJK0gOCgjBfziYCKXyNfkPvMEsOddWw2AcmyTC7H1jpOxRbgQWg3GELtn96UIe+0IPO/yJm/9rn6NZkPeLFgN8opN9Q+rORGJDyeWFprTdbEcNkGwsrh0vtsVTgBrD/ZmQzh+Z4ZG19BRM/jNHvZ/Ey93FtjRVu49x1ng6oorafFG5ny62czlVG83A3Qie6f9T6i8Y7UPpa/EzjLq3WS4d/+9XDMsojH/oBUFl2WspDkIx850k6FtK1AJOagKPBkKky71RyGzk5XS89MF6wTfLwWGBkYrYbZ8uKyAmJ1rVzxiA4sUYOrBj5PiXca14gPsmEwFNW2LLkrbC0JVOQgdqGynZmqyuG9n1z3yL8nCtd/2VpifN/fTWvWr9OiaUc3cH/MwWT/iviq9tBu3SSp3evN7pD2WuMF2vgQVAbxLCPjHkzgo1/45nJGZi2mECNgNT0Q4rAiTx+kKMvyFrTXhX1BgmZEJK26nvdY2BVOFsSeHsUR9d/LIC0vGscjpcUQrl9K5QboUFuNswIzfUuFg8mvNB5iqX5hDtWcU236qcP2IvE74QVZNUA/mQRnvUODs9p2lAv76sJThOsObbyByF2jcUFADFXI+okkRbpQam55Fv0plxQphqUeCFdiM7QtySvXREldLNU4kWS44XA7NX4SdhSMuIpkSqN10MmXYxGyjBTgpchz1lbtc7j2XUEWKpjyHmqBqb6l/QAZcOb2hZisruVwoR/56lP5BUQwRckrGfkO0vn0Y/KCL4/yEGBmm2C6ToOjcBpYo1z4+qqo6gqKuYPH4IVEmxIcBtC8LXzGt5aYZdGAXpz20TZuTiQg2UenMntvVweIP6z6TshENk0DpIBHPVaN10h5nA9WdzH Y6cVEnEj eiGTcaFb7qf2UJeRSw6F+NHSrHsYlOFYuSdMW4vH2SJNc8dMLf6gSIzKl1Uz3Moj85jdvo8UIM9ZacCwS6l0zoehYdoJdqez/1xDXLKl29XRsGiqv7d/VKBm9mPyl5vNIe1drm7efMj/EmLOw8WQV//T9zY71RgEIyT5k0Z31R7hUS4B7n9a9+yMOSBYGyx4sZScqL5MntNo9wumWG1vIQ/rrHPn+FnUgXhZjNNyc/QxEqUZlpG93KWPhqn5SUVvrdaQhTzZYJLu/d3IQlspjt7VNhlJTua7ku+EZw0aXENl4pizq1dD1v8tlPjPRCWz2dniDTO8rGjwkLf1fdnvSqDYWVlXW1vVGpmO/CcW0ItlbcPZnGACTnHIJyomXL6gh2EbgsZb5G6DRfYuxy48mKULiexhy+WZQDrlNJPxQ9ImIr3XRPEX8gAo+ax38qERxAwbuY+dNyJhJzIWzPmYxfEiNdkAzE/dznoKQaZQaPq2GwJnWdUjHsFFwT4mfM/kb2GdszCvDPiRPmYuTYsgplq8QuyqY4xjT//a66zeeXqEneAsnM2jwexBHA7lN8uldBfOk4dtvCnVXSJc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: mmu_notifier_test_clear_young() allows the caller to safely test and clear the accessed bit in KVM PTEs without taking the MMU lock. This patch adds the generic infrastructure to invoke the subsequent arch-specific patches. The arch-specific implementations generally rely on two techniques: RCU and cmpxchg. The former protects KVM page tables from being freed while the latter clears the accessed bit atomically against both the hardware and other software page table walkers. mmu_notifier_test_clear_young() follows two design patterns: fallback and batching. For any unsupported cases, it can optionally fall back to mmu_notifier_ops->clear_young(). For a range of KVM PTEs, it can test or test and clear their accessed bits according to a bitmap provided by the caller. mmu_notifier_test_clear_young() always returns 0 if fallback is not allowed. If fallback happens, its return value is similar to that of mmu_notifier_clear_young(). The bitmap parameter has the following specifications: 1. The number of bits should be at least (end-start)/PAGE_SIZE. 2. The offset of each bit is relative to the end. E.g., the offset corresponding to addr is (end-addr)/PAGE_SIZE-1. This is to better suit batching while forward looping. 3. For each KVM PTE with the accessed bit set (young), arch-specific implementations flip the corresponding bit in the bitmap. It only clears the accessed bit if the old value is 1. A caller can test or test and clear the accessed bit by setting the corresponding bit in the bitmap to 0 or 1, and the new value will be 1 or 0 for a young KVM PTE. Signed-off-by: Yu Zhao --- include/linux/kvm_host.h | 29 ++++++++++++++++++ include/linux/mmu_notifier.h | 40 +++++++++++++++++++++++++ mm/mmu_notifier.c | 26 ++++++++++++++++ virt/kvm/kvm_main.c | 58 ++++++++++++++++++++++++++++++++++++ 4 files changed, 153 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4f26b244f6d0..df46fc815c8b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2281,4 +2281,33 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 +/* + * Architectures that implement kvm_arch_test_clear_young() should override + * kvm_arch_has_test_clear_young(). + * + * kvm_arch_has_test_clear_young() is allowed to return false positive. It can + * return true if kvm_arch_test_clear_young() is supported but disabled due to + * some runtime constraint. In this case, kvm_arch_test_clear_young() should + * return false. + * + * The last parameter to kvm_arch_test_clear_young() is a bitmap with the + * following specifications: + * 1. The offset of each bit is relative to the second to the last parameter + * lsb_gfn. E.g., the offset corresponding to gfn is lsb_gfn-gfn. This is to + * better suit batching while forward looping. + * 2. For each KVM PTE with the accessed bit set, the implementation should flip + * the corresponding bit in the bitmap. It should only clear the accessed bit + * if the old value is 1. This allows the caller to test or test and clear + * the accessed bit. + */ +#ifndef kvm_arch_has_test_clear_young +static inline bool kvm_arch_has_test_clear_young(void) +{ + return false; +} +#endif + +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range, + gfn_t lsb_gfn, unsigned long *bitmap); + #endif diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 64a3e051c3c4..432b51cd6843 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -122,6 +122,11 @@ struct mmu_notifier_ops { struct mm_struct *mm, unsigned long address); + /* see the comments on mmu_notifier_test_clear_young() */ + bool (*test_clear_young)(struct mmu_notifier *mn, struct mm_struct *mm, + unsigned long start, unsigned long end, + unsigned long *bitmap); + /* * change_pte is called in cases that pte mapping to page is changed: * for example, when ksm remaps pte to point to a new shared page. @@ -390,6 +395,9 @@ extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, extern int __mmu_notifier_clear_young(struct mm_struct *mm, unsigned long start, unsigned long end); +extern int __mmu_notifier_test_clear_young(struct mm_struct *mm, + unsigned long start, unsigned long end, + bool fallback, unsigned long *bitmap); extern int __mmu_notifier_test_young(struct mm_struct *mm, unsigned long address); extern void __mmu_notifier_change_pte(struct mm_struct *mm, @@ -432,6 +440,31 @@ static inline int mmu_notifier_clear_young(struct mm_struct *mm, return 0; } +/* + * This function always returns 0 if fallback is not allowed. If fallback + * happens, its return value is similar to that of mmu_notifier_clear_young(). + * + * The bitmap has the following specifications: + * 1. The number of bits should be at least (end-start)/PAGE_SIZE. + * 2. The offset of each bit is relative to the end. E.g., the offset + * corresponding to addr is (end-addr)/PAGE_SIZE-1. This is to better suit + * batching while forward looping. + * 3. For each KVM PTE with the accessed bit set (young), this function flips + * the corresponding bit in the bitmap. It only clears the accessed bit if + * the old value is 1. A caller can test or test and clear the accessed bit + * by setting the corresponding bit in the bitmap to 0 or 1, and the new + * value will be 1 or 0 for a young KVM PTE. + */ +static inline int mmu_notifier_test_clear_young(struct mm_struct *mm, + unsigned long start, unsigned long end, + bool fallback, unsigned long *bitmap) +{ + if (mm_has_notifiers(mm)) + return __mmu_notifier_test_clear_young(mm, start, end, fallback, bitmap); + + return 0; +} + static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { @@ -684,6 +717,13 @@ static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm, return 0; } +static inline int mmu_notifier_test_clear_young(struct mm_struct *mm, + unsigned long start, unsigned long end, + bool fallback, unsigned long *bitmap) +{ + return 0; +} + static inline int mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 50c0dde1354f..dd39b9b4d6d3 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -402,6 +402,32 @@ int __mmu_notifier_clear_young(struct mm_struct *mm, return young; } +/* see the comments on mmu_notifier_test_clear_young() */ +int __mmu_notifier_test_clear_young(struct mm_struct *mm, + unsigned long start, unsigned long end, + bool fallback, unsigned long *bitmap) +{ + int key; + struct mmu_notifier *mn; + int young = 0; + + key = srcu_read_lock(&srcu); + + hlist_for_each_entry_srcu(mn, &mm->notifier_subscriptions->list, + hlist, srcu_read_lock_held(&srcu)) { + if (mn->ops->test_clear_young && + mn->ops->test_clear_young(mn, mm, start, end, bitmap)) + continue; + + if (fallback && mn->ops->clear_young) + young |= mn->ops->clear_young(mn, mm, start, end); + } + + srcu_read_unlock(&srcu, key); + + return young; +} + int __mmu_notifier_test_young(struct mm_struct *mm, unsigned long address) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9c60384b5ae0..1b465df4a93d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -875,6 +875,63 @@ static int kvm_mmu_notifier_clear_young(struct mmu_notifier *mn, return kvm_handle_hva_range_no_flush(mn, start, end, kvm_age_gfn); } +static bool kvm_test_clear_young(struct kvm *kvm, unsigned long start, + unsigned long end, unsigned long *bitmap) +{ + int i; + int key; + bool success = true; + + trace_kvm_age_hva(start, end); + + key = srcu_read_lock(&kvm->srcu); + + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { + struct interval_tree_node *node; + struct kvm_memslots *slots = __kvm_memslots(kvm, i); + + kvm_for_each_memslot_in_hva_range(node, slots, start, end - 1) { + gfn_t lsb_gfn; + unsigned long hva_start, hva_end; + struct kvm_gfn_range range = { + .slot = container_of(node, struct kvm_memory_slot, + hva_node[slots->node_idx]), + }; + + hva_start = max(start, range.slot->userspace_addr); + hva_end = min(end - 1, range.slot->userspace_addr + + range.slot->npages * PAGE_SIZE - 1); + + range.start = hva_to_gfn_memslot(hva_start, range.slot); + range.end = hva_to_gfn_memslot(hva_end, range.slot) + 1; + + if (WARN_ON_ONCE(range.end <= range.start)) + continue; + + /* see the comments on the generic kvm_arch_has_test_clear_young() */ + lsb_gfn = hva_to_gfn_memslot(end - 1, range.slot); + + success = kvm_arch_test_clear_young(kvm, &range, lsb_gfn, bitmap); + if (!success) + break; + } + } + + srcu_read_unlock(&kvm->srcu, key); + + return success; +} + +static bool kvm_mmu_notifier_test_clear_young(struct mmu_notifier *mn, struct mm_struct *mm, + unsigned long start, unsigned long end, + unsigned long *bitmap) +{ + if (kvm_arch_has_test_clear_young()) + return kvm_test_clear_young(mmu_notifier_to_kvm(mn), start, end, bitmap); + + return false; +} + static int kvm_mmu_notifier_test_young(struct mmu_notifier *mn, struct mm_struct *mm, unsigned long address) @@ -903,6 +960,7 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = { .clear_flush_young = kvm_mmu_notifier_clear_flush_young, .clear_young = kvm_mmu_notifier_clear_young, .test_young = kvm_mmu_notifier_test_young, + .test_clear_young = kvm_mmu_notifier_test_clear_young, .change_pte = kvm_mmu_notifier_change_pte, .release = kvm_mmu_notifier_release, };