From patchwork Fri Feb 17 04:12:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13144314 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9CB8C636D4 for ; Fri, 17 Feb 2023 04:12:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17CBF6B0078; Thu, 16 Feb 2023 23:12:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 12E006B007B; Thu, 16 Feb 2023 23:12:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE90B6B007D; Thu, 16 Feb 2023 23:12:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DCC286B0078 for ; Thu, 16 Feb 2023 23:12:44 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A2F8B1213DF for ; Fri, 17 Feb 2023 04:12:44 +0000 (UTC) X-FDA: 80475462648.13.84D82B2 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf13.hostedemail.com (Postfix) with ESMTP id E77CC20007 for ; Fri, 17 Feb 2023 04:12:42 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="tKKqOK/4"; spf=pass (imf13.hostedemail.com: domain of 3uv7uYwYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3uv7uYwYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676607163; a=rsa-sha256; cv=none; b=5qepcvKIHqJ+SLmTS9GOxXFvMt8PmemevmSkFwgMV2lshOllSdLz+BxrStLK/5uQocpIzS VWFPLlp894Uhca/A/cAi07teK9XOk4DN4O/Ope+Nxix6DmRiMemDWHI1/j47XypkF8/g5z vqKlvK7NxQj6xxy/tpO/b0iRT6RtVwQ= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="tKKqOK/4"; spf=pass (imf13.hostedemail.com: domain of 3uv7uYwYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3uv7uYwYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676607163; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=amHoLle71o7p/XllEoZS3hif3UtpM0j0bW+cZG3KVdY=; b=MIBzGMV/WnNj/EtQZx/QecpcR0cKGIIThvhEvzfD5WcH1aze1ds5CoP1nXHNTWMuGV6mUk gUTF1BTn57as+gQExVjxERp2x5Iee18LrHyBTc9y8kx7W/lhTd5GdmmsrZCRvlCZSh7nq0 ZOrPbQ4pR4NbE5HGqwQkk+JKGE4AqT0= Received: by mail-yb1-f202.google.com with SMTP id 191-20020a2504c8000000b008e2fd9e34e6so4241248ybe.9 for ; Thu, 16 Feb 2023 20:12:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=amHoLle71o7p/XllEoZS3hif3UtpM0j0bW+cZG3KVdY=; b=tKKqOK/4v9Nsl+iZ5uj+QLwvBB9/MlZmEqfeAcO65vdFUXJ2neb8OCPtRfPOaWDyxP FkEFBCKiFG2TAlI4pbNOUhKBZytnLS1P7nr+a+PTPquwKNcmuXNGSPjkFUGmw29bW+Ah /Da997b9GFB7bCnPYHpg2jKoZ2Y/mB2zh9Hp5FwTJeHoexE5+GN1xeuXBJXsz8C8IX5S L7gcpavYRz3yYEoRWFaXHcNffqgqtABgaN3Ncgntbv/iUEBl68KG/tq6fbetIAu2dGSB ToUCeR9ccz+fO4AOL2Ch/Flg7o+hMxiFfDMOP9ByuijjtcQxaoorvyhl8x0v3P6WJKBQ RryA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=amHoLle71o7p/XllEoZS3hif3UtpM0j0bW+cZG3KVdY=; b=z0Ho4ZT3ve/YG8ABIMnct5QwXr217Xfz90wIxBWDkxWt/JGfnxPQfhVoVb+7FtHPRb snzkDqGMEvXUXxsTPMYWMBK2UygKnOE7gcsCMoKXR+DXxcXeX9lQ0sXZ15WFP54eUBcd T5OWXkumn2r/044aTBOgJiB7U29X+DFcTtqXML1MnM6zoLU15x5DWK5zHgRp5Y5UJe5e 26Ggc4/zXhlG5lpjUEzb2hk8hqRwHPbNNuj6vT2f6Q1/w+0yaceYOLyz3iDhehCQukwa 032MRNvJ0G2EboFt/GHNS7CoaHYC2ouHKN+6EaS69f/RXOyYq5RkKmv6idQAvV7zMht4 Kqxg== X-Gm-Message-State: AO0yUKUQ0RLRKT0lU1Id9nLkvURtP3EJaTV8+8y0BkZqULwjyrPvqfsx /GNrivf7276h0UrU981VA4tjMQNRsrU= X-Google-Smtp-Source: AK7set/zG1Ljp2MUiBWQb+jj+0ocG59LCXivAn6wr+nza0TKNrRHXUDRDhG90IH2RrsF7xFXRI6zxpp9jUg= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:6fb3:61e:d31f:1ad3]) (user=yuzhao job=sendgmr) by 2002:a05:6902:3c7:b0:8dd:52a3:b3a5 with SMTP id g7-20020a05690203c700b008dd52a3b3a5mr70700ybs.5.1676607162038; Thu, 16 Feb 2023 20:12:42 -0800 (PST) Date: Thu, 16 Feb 2023 21:12:29 -0700 In-Reply-To: <20230217041230.2417228-1-yuzhao@google.com> Message-Id: <20230217041230.2417228-5-yuzhao@google.com> Mime-Version: 1.0 References: <20230217041230.2417228-1-yuzhao@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Subject: [PATCH mm-unstable v1 4/5] kvm/powerpc: add kvm_arch_test_clear_young() From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Jonathan Corbet , Michael Larabel , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-Rspam-User: X-Rspamd-Queue-Id: E77CC20007 X-Rspamd-Server: rspam01 X-Stat-Signature: 9zaih4r9g744f1waju173s6x3dn4owb7 X-HE-Tag: 1676607162-905901 X-HE-Meta: U2FsdGVkX1800kBZW+uxjO5dJHB1OdgYPUz6yF4GXYwzDwtfKCCTXyh6fHbnfJaveOQIz8rJWbUK2mnTMH079VJkjN003iY48iCitfjX55oOWMI5Lzcs9WG1VVkvokakhYms9Iakero9TDM+AGbRZ0pEus10x2zC3sK5YB81vhuKeNoN5JE0VMQ3Hv43Z5tXXcOALGrCYhzUBlO/hn7Muv1wcg+Z6FmcNJT3TPu46THlNicS3k2HTT1El9M1XNdmCVqeIsjblrxvxi8l1jIooUSGRJjoY+V5Do1qKCgHbKlJ/xeuvDk2Ev7HrPGcG7zts3I6Gn0A4cwV3o/Xut1f24bMlvxFw3ALtYIDc2/QJY8tBVlvlUNPQlrLKF9zD18hhq6WyN8VMUwXSMi+/n02kjGASEJ+8+ZBUfKT4IM+pEEKgqAGpEmA1fHq8yxhdfd5lHaZGxrqhq6CvsyOa2lp/4AyxLGQuJ9pDbBNL9AFwGWszz2fhJIoDH/rNsmxMRY1Hc9Grg0UmBF7ZdFfDfM+vD2xWZuP68pE6wEoSnJ72OWY2DBBCiSyZ3P+Z6xVIkIr2Qemih6jNrfN1QcUwUMy7NMQQxut+vA8iSPD7UfJk9Av4fPAUlb3SnKz9cmY4t6w7AljmyKkYq67B6/y9UYvXNDqaZMKKh3HZWKJ6n10hgmOz8+K3spWwrGmxPKuIM6fOY+kHgt4igKsjkXfbhF+sCnv9iB+/+zwmqbRN9W0QlpdSHsIebYP0G9uXmCLuzb/ddAk6vrgKNoAktUKW3WrBjQyFFKZY8H+toPu+AAkbJAxTaXY9DhJd/u5oaYlgB0SjJAG4bYwenCd+RK0/xxhsmUimeBq1EfFJp0jtGNvnn/RVVEb1lWpOZPn+Ux0dgxFXPJPDWe2a32GElgI0SJKFkmF3V5owP+GnvHDbhDV/Piz8clJd1FBOg4w4PcH9RajzvFHxxVC2vZnv5C0D3g TOFuSOlp J4Un7wPHbCZBZboiaN5rY18Ovil2Gavlqtqw+v8cODVnd6g7x9JWuPs1RQLTNSogsh021DEqXTwLLdNLaxjNAultivUdCoOmiL6q3xMIa4nbOeA6NTRYL86D8b79rd0ZVV8qvXGqz812actj8b4sSFTFSJ57HLu8lQ1ZBzrslkmyrqsvf957z9XXkYeIzuw+2apKx2mxhFJc6+sPebZ4PqS1AfGuvSqJjFFs9/wMmJfRHx3w3iD53xDl7uGPFCGy9hqGfBUeQw6EbkyAsBDhVNDWPec7gaZvImxz9CJkhfyRel1+Q9n5Nz0cL03xfDSDEqkwgXXwZ1KzP6yyFJ89OI0JyeHZ6yowA2D4opQe+qpR63RM243cgkyKiyHXXBaC9BLbRveMgsG9y/bLAQY8qwsDjBtTCzH9jVOyeQApQOw+o+HT1Q8TvS6eyg4b/1Pq3EYIJ04MM5BjPO+t4wlZdtAsZnvGTe453HECFk5jPPi/EUYIeIwr+AO3oW5KvkZs1MYDpsjgoxfC7IaTV5Ol76lbRVDeOHU9uMvswd1OIC970YfI45+I/RxwFZ0atRKz6pocoiZyHizaO724= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds kvm_arch_test_clear_young() for the vast majority of VMs that are not nested and run on hardware with Radix MMU enabled. It relies on two techniques, RCU and cmpxchg, to safely test and clear the accessed bit without taking the MMU lock. The former protects KVM page tables from being freed while the latter clears the accessed bit atomically against both the hardware and other software page table walkers. Signed-off-by: Yu Zhao --- arch/powerpc/include/asm/kvm_host.h | 18 ++++++ arch/powerpc/include/asm/kvm_ppc.h | 14 +---- arch/powerpc/kvm/book3s.c | 7 +++ arch/powerpc/kvm/book3s.h | 2 + arch/powerpc/kvm/book3s_64_mmu_radix.c | 78 +++++++++++++++++++++++++- arch/powerpc/kvm/book3s_hv.c | 10 ++-- 6 files changed, 110 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index caea15dcb91d..996850029ce0 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -886,4 +886,22 @@ static inline void kvm_arch_exit(void) {} static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {} +static inline int kvmppc_radix_possible(void) +{ + return cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled(); +} + +static inline bool kvmhv_on_pseries(void) +{ + return IS_ENABLED(CONFIG_PPC_PSERIES) && !cpu_has_feature(CPU_FTR_HVMODE); +} + +/* see the comments on the generic kvm_arch_has_test_clear_young() */ +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young +static inline bool kvm_arch_has_test_clear_young(void) +{ + return IS_ENABLED(CONFIG_KVM) && IS_ENABLED(CONFIG_KVM_BOOK3S_HV_POSSIBLE) && + kvmppc_radix_possible() && !kvmhv_on_pseries(); +} + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index eae9619b6190..0bb772fc12b1 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -277,6 +277,8 @@ struct kvmppc_ops { bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); + bool (*test_clear_young)(struct kvm *kvm, struct kvm_gfn_range *range, + gfn_t lsb_gfn, unsigned long *bitmap); bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); void (*free_memslot)(struct kvm_memory_slot *slot); int (*init_vm)(struct kvm *kvm); @@ -580,18 +582,6 @@ static inline bool kvm_hv_mode_active(void) { return false; } #endif -#ifdef CONFIG_PPC_PSERIES -static inline bool kvmhv_on_pseries(void) -{ - return !cpu_has_feature(CPU_FTR_HVMODE); -} -#else -static inline bool kvmhv_on_pseries(void) -{ - return false; -} -#endif - #ifdef CONFIG_KVM_XICS static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu) { diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 6d525285dbe8..f4cf330e3e81 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -877,6 +877,13 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) return kvm->arch.kvm_ops->test_age_gfn(kvm, range); } +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range, + gfn_t lsb_gfn, unsigned long *bitmap) +{ + return kvm->arch.kvm_ops->test_clear_young && + kvm->arch.kvm_ops->test_clear_young(kvm, range, lsb_gfn, bitmap); +} + bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { return kvm->arch.kvm_ops->set_spte_gfn(kvm, range); diff --git a/arch/powerpc/kvm/book3s.h b/arch/powerpc/kvm/book3s.h index 58391b4b32ed..fe9cac423817 100644 --- a/arch/powerpc/kvm/book3s.h +++ b/arch/powerpc/kvm/book3s.h @@ -12,6 +12,8 @@ extern void kvmppc_core_flush_memslot_hv(struct kvm *kvm, extern bool kvm_unmap_gfn_range_hv(struct kvm *kvm, struct kvm_gfn_range *range); extern bool kvm_age_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range); extern bool kvm_test_age_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range); +extern bool kvmhv_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range, + gfn_t lsb_gfn, unsigned long *bitmap); extern bool kvm_set_spte_gfn_hv(struct kvm *kvm, struct kvm_gfn_range *range); extern int kvmppc_mmu_init_pr(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 9d3743ca16d5..8476646c554c 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -1083,6 +1083,78 @@ bool kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot, return ref; } +bool kvmhv_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range, + gfn_t lsb_gfn, unsigned long *bitmap) +{ + bool success; + gfn_t gfn = range->start; + + if (WARN_ON_ONCE(!kvm_arch_has_test_clear_young())) + return false; + + /* + * This function relies on two techniques, RCU and cmpxchg, to safely + * test and clear the accessed bit without taking the MMU lock. The + * former protects KVM page tables from being freed while the latter + * clears the accessed bit atomically against both the hardware and + * other software page table walkers. + */ + rcu_read_lock(); + + success = kvm_is_radix(kvm); + if (!success) + goto unlock; + + /* + * case 1: this function kvmppc_switch_mmu_to_hpt() + * + * rcu_read_lock() + * test kvm_is_radix() kvm->arch.radix = 0 + * use kvm->arch.pgtable + * rcu_read_unlock() + * synchronize_rcu() + * kvmppc_free_radix() + * + * + * case 2: this function kvmppc_switch_mmu_to_radix() + * + * kvmppc_init_vm_radix() + * smp_wmb() + * test kvm_is_radix() kvm->arch.radix = 1 + * smp_rmb() + * use kvm->arch.pgtable + */ + smp_rmb(); + + while (gfn < range->end) { + pte_t *ptep; + pte_t old, new; + unsigned int shift; + + ptep = find_kvm_secondary_pte_unlocked(kvm, gfn * PAGE_SIZE, &shift); + if (!ptep) + goto next; + + VM_WARN_ON_ONCE(!page_count(virt_to_page(ptep))); + + old = READ_ONCE(*ptep); + if (!pte_present(old) || !pte_young(old)) + goto next; + + new = pte_mkold(old); + + /* see the comments on the generic kvm_arch_has_test_clear_young() */ + if (__test_and_change_bit(lsb_gfn - gfn, bitmap)) + pte_xchg(ptep, old, new); +next: + gfn += shift ? BIT(shift - PAGE_SHIFT) : 1; + } +unlock: + rcu_read_unlock(); + + return success; +} + /* Returns the number of PAGE_SIZE pages that are dirty */ static int kvm_radix_test_clear_dirty(struct kvm *kvm, struct kvm_memory_slot *memslot, int pagenum) @@ -1464,13 +1536,15 @@ int kvmppc_radix_init(void) { unsigned long size = sizeof(void *) << RADIX_PTE_INDEX_SIZE; - kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor); + kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, + SLAB_TYPESAFE_BY_RCU, pte_ctor); if (!kvm_pte_cache) return -ENOMEM; size = sizeof(void *) << RADIX_PMD_INDEX_SIZE; - kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, 0, pmd_ctor); + kvm_pmd_cache = kmem_cache_create("kvm-pmd", size, size, + SLAB_TYPESAFE_BY_RCU, pmd_ctor); if (!kvm_pmd_cache) { kmem_cache_destroy(kvm_pte_cache); return -ENOMEM; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6ba68dd6190b..17b415661282 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -5242,6 +5242,8 @@ int kvmppc_switch_mmu_to_hpt(struct kvm *kvm) spin_lock(&kvm->mmu_lock); kvm->arch.radix = 0; spin_unlock(&kvm->mmu_lock); + /* see the comments in kvmhv_test_clear_young() */ + synchronize_rcu(); kvmppc_free_radix(kvm); lpcr = LPCR_VPM1; @@ -5266,6 +5268,8 @@ int kvmppc_switch_mmu_to_radix(struct kvm *kvm) if (err) return err; kvmppc_rmap_reset(kvm); + /* see the comments in kvmhv_test_clear_young() */ + smp_wmb(); /* Mutual exclusion with kvm_unmap_gfn_range etc. */ spin_lock(&kvm->mmu_lock); kvm->arch.radix = 1; @@ -6165,6 +6169,7 @@ static struct kvmppc_ops kvm_ops_hv = { .unmap_gfn_range = kvm_unmap_gfn_range_hv, .age_gfn = kvm_age_gfn_hv, .test_age_gfn = kvm_test_age_gfn_hv, + .test_clear_young = kvmhv_test_clear_young, .set_spte_gfn = kvm_set_spte_gfn_hv, .free_memslot = kvmppc_core_free_memslot_hv, .init_vm = kvmppc_core_init_vm_hv, @@ -6225,11 +6230,6 @@ static int kvm_init_subcore_bitmap(void) return 0; } -static int kvmppc_radix_possible(void) -{ - return cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled(); -} - static int kvmppc_book3s_init_hv(void) { int r;