From patchwork Thu Sep 26 01:34:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13812679 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0792CCF9EB for ; Thu, 26 Sep 2024 01:35:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 23F8C6B0093; Wed, 25 Sep 2024 21:35:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 108906B00B5; Wed, 25 Sep 2024 21:35:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4EF36B00B6; Wed, 25 Sep 2024 21:35:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C32EB6B0093 for ; Wed, 25 Sep 2024 21:35:22 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 795AD141163 for ; Thu, 26 Sep 2024 01:35:22 +0000 (UTC) X-FDA: 82605171684.20.3E5FF04 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf25.hostedemail.com (Postfix) with ESMTP id BD78FA0005 for ; Thu, 26 Sep 2024 01:35:20 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wKAyLZqf; spf=pass (imf25.hostedemail.com: domain of 3V7r0ZgoKCNwHRFMSEFRMLEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3V7r0ZgoKCNwHRFMSEFRMLEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727314484; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=E9ikx7qvh9LLa0/zOZpb6K1KXC0ViYbLtiviLVHMlqE=; b=k9RUb9zMOSdabk/cBSQbEIFlF6hdn+kkiWySGDXxyVTwifk/aMYSZOo+M8qOJYG3V1PZbg 3MopLGXNmACb0kKL5+jnkno6+r5c5yzeU3aIBUshNEAAKdFqvAXyn9u3BGmelFIwTlTIs5 UPcLzuvQoUIzbPEQGC48b1VdhrCr4Vg= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wKAyLZqf; spf=pass (imf25.hostedemail.com: domain of 3V7r0ZgoKCNwHRFMSEFRMLEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3V7r0ZgoKCNwHRFMSEFRMLEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727314484; a=rsa-sha256; cv=none; b=jD5JnLVeKg61GW4PharQKe7k0ikqCaOCi6Ii+MTUJZEJOf2ju/BONxk8ZNOnlxq/q1hKf3 YEII6KTJlBNYgr7eQ85iqw2C6/nmjQs0Fie2w3lToZxD4X4E5b9IxCQUsjYAJMAAknzdmv sCcTQwO9Cgc7DR6OKBtY1Kl5x9cFDMQ= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6dbbeee08f0so23490077b3.0 for ; Wed, 25 Sep 2024 18:35:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314520; x=1727919320; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=E9ikx7qvh9LLa0/zOZpb6K1KXC0ViYbLtiviLVHMlqE=; b=wKAyLZqfk7rJySIS4NzbnByeI6MPYk5MXEcR9DryaRq49eDduhVtu25v013IDgDua6 P8QuOyixCuYVHK0FVHQ9q4U1lDg+s3ST1x6H7x3+q9tGosXfFDt/ZJ2lfyr74tchK51a y6BwnUc8sCjNtMPwk7yBIcab4Y4JokK23nFjG/EWaVlPfMyrG/jX9iPauh5g/QtYEz+J SNY/IX4wVOjKDlqYWbX3nHxvexXUcV1P4qH4PJcBQ3wxpUMNqONgD6GQ9+2ow8TjM0+F Q7M+wK0RqV4L0sAH1g6pEJEi86l5/9Lu6yeQYwFIujBahbruOgANY3jNmDp6+kWikTxx sP+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314520; x=1727919320; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=E9ikx7qvh9LLa0/zOZpb6K1KXC0ViYbLtiviLVHMlqE=; b=hUlQ0P1jHWapQuatOVPwGFJ5QpP4u0EcTIKSHf+n+vEbPY1QiwUASzHxvKBvGMTJQ3 c64zaJu3G5dmcBm1L8/xD4k9X6L/NW3DQs4oyiM4UtphXpkFoS3rjLk5q4jmoKlJ9IIn 1ZesRxg8NDGLWusX23/Y1ZYpHk326feUAOHzqVf5DKtNjejjFtRYosE8NeFjBazYQ9A7 lcu84YoWhrO6t7dP5vnG3xp/ERiLNicpLw66SNyHKoSjybVv/dXuVv8S0ugoWK4U+Chb KP7Rrta4sEoscFrh67KBNFI7wNi0q4Az0H9BC9h/Ix55bqciH+z3S0x4Uurw+48vD5SZ 4jpg== X-Forwarded-Encrypted: i=1; AJvYcCU20Q+9msBZLoKlNFGbBERIWpLI+5lnkRNhEDuoSP6lHEFpOcYFIB9U/LpKf3xY2mMRUhCyq+hgIg==@kvack.org X-Gm-Message-State: AOJu0YxnH3B4vBsuWcehP6/bzCzJPjCiSUqojCp89nxz8l1ycYo87Q0F 2yMeBaYFUQiOvH/i8eKKAjRIkJwfZ08H1PxM018uTqied4NyfLEquNIZwjI75JgqnUSG4hj/mzY K34S1Mm4EQUwHNwvW8A== X-Google-Smtp-Source: AGHT+IH0FiQjxyLo+JfGazQOCkHrPRgPRywE06mMVfat9iBHnlYATt5rsXjVLhFjPfx7L1BClqPZPXX/no7MzpGQ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a05:690c:2b92:b0:6dd:bb6e:ec89 with SMTP id 00721157ae682-6e22efd31b8mr137077b3.2.1727314519545; Wed, 25 Sep 2024 18:35:19 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:52 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-5-jthoughton@google.com> Subject: [PATCH v7 04/18] KVM: x86/mmu: Relax locking for kvm_test_age_gfn and kvm_age_gfn From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspam-User: X-Stat-Signature: h4pueh3b3me6zrfzaebqufsxmynestwq X-Rspamd-Queue-Id: BD78FA0005 X-Rspamd-Server: rspam11 X-HE-Tag: 1727314520-662241 X-HE-Meta: U2FsdGVkX1+j1bBFts7dU31T1TCaARVNxLikVK7/CYCwI+2it9xk84x+OtzErkHxwdiOG+n/bHeNNCUx71KDfgLvrDN68SnfUZkeySWX3HMS0heefZsd5RnLhdpw91LhJ3T6GiVI8ki6NrhXA6ocerlG4Rfqn8QxJkCA7j7PAJH59y2m7lpc4BWStzetDfHu2D2dcIKUxQwLjvc1E2E9TnCIZqHwPzl/SDG9A/oUBh+BmvFQZe0VkyIU3CkqYouNMBkV/JWbE51WrDGjTHoLYBD0bvaJjlPhVQW5uJCqljwzoEwa99pV9itHqHttqiIiU4P4dPlDlqzqkNLZQ+55UZIviJ17D1CNWN72yZFDbjqeyugHH1j+xkpVoGcyivU9LUPuQ0xUnm8pBFBx8LmFmbaP1rMVeqUqZC2/KII7Fc5oM3tLqThnOK4f5d1RUwlFRh9RB8EBlm+rekSMQXyxrUEnSzFc/vqsiNvnJdtrIJaStSq/A5sSX3CSSjL1WwehzJHcY/q/kRqMjs/i+L9VmY0T01Qh+njMvjWH8xCcFPNF1CpqAQPFefykGnqrj/YjI7747o+AIMbVntUU9lc9wCRlKabHLLgWEqPni2llLZuc95kTpFHg5v6yrdXwukeCw0TtG+E7Cdi29CcpOS7g1xlEIumoRhjxGMJh9sqdRcvL04JE5WllHXr5NEaYxZkMhA4BSmb4SfjmFHj8zZ1B+rUyzgiK5vsQ4yh8Fcf5Fv10OmN275FWF44Bh2aLImyGCI2ZXnE/k5LprkjT72Y9M9NnGbI7W+fJkYnnmSHIgmg2I6u7g66FhHntEZjEk7UxZKYgo/CQ+vKAqchw86oCcUkUcsjyIdqRx9x1+MvjdpRsLm7E+qX1X2/1g8nxBaX2Dj8gAQ4ZWb8oFe/Fbh1AM6wGWnpuRX/ZhcMTkCmkzNwAEPan40vKu0dq9T1+tpqDKrW6ohg2aLJFdOnmIGV DkpBxxAs xLQxjel/3/z7bVzLt/Fk3Y1kDrbj4/rHWByRtg9wR9U5nAUOIJ3x1RXSuUgJuwX2GgocfibiNP0H0Q2cKPq9AhTywMD7SYpFFDQJLTIvJEdkrQ+x/caVzJzGWKdVkuF+LkPrbsBxY86fnHV/n0CjX02Md1TCDpJa3EG1vfAV4znbsKEDWE6zXnpIXpdn8pisiwcoAsiP03XI0MF0S4ronntX6Yd8ufdq75qBHN4zlRkGOQZcYltrLw6A8jcJT/UJH2qYIw+gLD90Mc+6GrhreKX6RWvV8xHy2epsQLSGKQj0zTLd9IGNNRp79UyfuPtsroZtGCNmvaQxlIzQlJkE6v8/8yIEBHO94e9n4RVJtB06nAktvX747frH6l8SYDsWWOxWNlMdbpytrsUmEJ3NOOI1rTLpvXCtZk2M/iYqugqhclhFw7ivmQZQpxcaxCB1oQ86u1A2NLgPVaYz2Z88bjbuGGdQM4kErPRFh6IwcAzhmJ++qSG2CIh5TF2vrKLB0fv9w6NZMkKOi2ehUbZLBglfI/hmmBqluh3Ryc17Bkf025Ao5d+xEnFHoFNEvnOfaLXCzt+KeiNxu7EeDXmA2hy8ExoYDSBXdON5ooWXs4E0FBjA2SB4gN/0lDg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Walk the TDP MMU in an RCU read-side critical section without holding mmu_lock when harvesting and potentially updating age information on sptes. This requires a way to do RCU-safe walking of the tdp_mmu_roots; do this with a new macro. The PTE modifications are now done atomically, and kvm_tdp_mmu_spte_need_atomic_write() has been updated to account for the fact that kvm_age_gfn can now lockless update the accessed bit and the W/R/X bits). If the cmpxchg for marking the spte for access tracking fails, leave it as is and treat it as if it were young, as if the spte is being actively modified, it is most likely young. Harvesting age information from the shadow MMU is still done while holding the MMU write lock. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 10 ++++-- arch/x86/kvm/mmu/tdp_iter.h | 14 ++++---- arch/x86/kvm/mmu/tdp_mmu.c | 57 ++++++++++++++++++++++++--------- 5 files changed, 58 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 46e0a466d7fb..adc814bad4bb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1454,6 +1454,7 @@ struct kvm_arch { * tdp_mmu_page set. * * For reads, this list is protected by: + * RCU alone or * the MMU lock in read mode + RCU or * the MMU lock in write mode * diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index faed96e33e38..3928e9b2d84a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -23,6 +23,7 @@ config KVM depends on X86_LOCAL_APIC select KVM_COMMON select KVM_GENERIC_MMU_NOTIFIER + select KVM_MMU_NOTIFIER_YOUNG_LOCKLESS select HAVE_KVM_IRQCHIP select HAVE_KVM_PFNCACHE select HAVE_KVM_DIRTY_RING_TSO diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0d94354bb2f8..355a66c26517 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1649,8 +1649,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_rmap_age_gfn_range(kvm, range, false); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_age_gfn_range(kvm, range); @@ -1662,8 +1665,11 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { bool young = false; - if (kvm_memslots_have_rmaps(kvm)) + if (kvm_memslots_have_rmaps(kvm)) { + write_lock(&kvm->mmu_lock); young = kvm_rmap_age_gfn_range(kvm, range, true); + write_unlock(&kvm->mmu_lock); + } if (tdp_mmu_enabled) young |= kvm_tdp_mmu_test_age_gfn(kvm, range); diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index ec171568487c..510936a8455a 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -39,10 +39,11 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) } /* - * SPTEs must be modified atomically if they are shadow-present, leaf - * SPTEs, and have volatile bits, i.e. has bits that can be set outside - * of mmu_lock. The Writable bit can be set by KVM's fast page fault - * handler, and Accessed and Dirty bits can be set by the CPU. + * SPTEs must be modified atomically if they have bits that can be set outside + * of the mmu_lock. This can happen for any shadow-present leaf SPTEs, as the + * Writable bit can be set by KVM's fast page fault handler, the Accessed and + * Dirty bits can be set by the CPU, and the Accessed and R/X bits can be + * cleared by age_gfn_range. * * Note, non-leaf SPTEs do have Accessed bits and those bits are * technically volatile, but KVM doesn't consume the Accessed bit of @@ -53,8 +54,7 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte) static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int level) { return is_shadow_present_pte(old_spte) && - is_last_spte(old_spte, level) && - spte_has_volatile_bits(old_spte); + is_last_spte(old_spte, level); } static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, @@ -70,8 +70,6 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte, u64 mask, int level) { - atomic64_t *sptep_atomic; - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) return tdp_mmu_clear_spte_bits_atomic(sptep, mask); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3b996c1fdaab..4477201c2d53 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -178,6 +178,15 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, ((_only_valid) && (_root)->role.invalid))) { \ } else +/* + * Iterate over all TDP MMU roots in an RCU read-side critical section. + */ +#define for_each_valid_tdp_mmu_root_rcu(_kvm, _root, _as_id) \ + list_for_each_entry_rcu(_root, &_kvm->arch.tdp_mmu_roots, link) \ + if ((_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) || \ + (_root)->role.invalid) { \ + } else + #define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ __for_each_tdp_mmu_root(_kvm, _root, _as_id, false) @@ -1222,6 +1231,26 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm, return ret; } +static __always_inline bool kvm_tdp_mmu_handle_gfn_lockless(struct kvm *kvm, + struct kvm_gfn_range *range, + tdp_handler_t handler) +{ + struct kvm_mmu_page *root; + struct tdp_iter iter; + bool ret = false; + + rcu_read_lock(); + + for_each_valid_tdp_mmu_root_rcu(kvm, root, range->slot->as_id) { + tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) + ret |= handler(kvm, &iter, range); + } + + rcu_read_unlock(); + + return ret; +} + /* * Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero * if any of the GFNs in the range have been accessed. @@ -1240,23 +1269,21 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter, return false; if (spte_ad_enabled(iter->old_spte)) { - iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep, - iter->old_spte, - shadow_accessed_mask, - iter->level); + iter->old_spte = tdp_mmu_clear_spte_bits_atomic(iter->sptep, + shadow_accessed_mask); new_spte = iter->old_spte & ~shadow_accessed_mask; } else { - /* - * Capture the dirty status of the page, so that it doesn't get - * lost when the SPTE is marked for access tracking. - */ + new_spte = mark_spte_for_access_track(iter->old_spte); + if (__tdp_mmu_set_spte_atomic(iter, new_spte)) + /* + * The cmpxchg failed. Even if we had cleared the + * Accessed bit, it likely would have been set again, + * so this spte is probably young. + */ + return true; + if (is_writable_pte(iter->old_spte)) kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte)); - - new_spte = mark_spte_for_access_track(iter->old_spte); - iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep, - iter->old_spte, new_spte, - iter->level); } trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level, @@ -1266,7 +1293,7 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter, bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm_tdp_mmu_handle_gfn(kvm, range, age_gfn_range); + return kvm_tdp_mmu_handle_gfn_lockless(kvm, range, age_gfn_range); } static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, @@ -1277,7 +1304,7 @@ static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, bool kvm_tdp_mmu_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) { - return kvm_tdp_mmu_handle_gfn(kvm, range, test_age_gfn); + return kvm_tdp_mmu_handle_gfn_lockless(kvm, range, test_age_gfn); } /*