From patchwork Fri Feb 17 04:12:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13144311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82BF1C64ED8 for ; Fri, 17 Feb 2023 04:12:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA21E6B0074; Thu, 16 Feb 2023 23:12:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C50EF6B0075; Thu, 16 Feb 2023 23:12:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA4216B0078; Thu, 16 Feb 2023 23:12:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9B0496B0074 for ; Thu, 16 Feb 2023 23:12:41 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6EE47A14D4 for ; Fri, 17 Feb 2023 04:12:41 +0000 (UTC) X-FDA: 80475462522.03.A1C0F64 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf14.hostedemail.com (Postfix) with ESMTP id ABD31100010 for ; Fri, 17 Feb 2023 04:12:39 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bm43kjVq; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of 3tv7uYwYKCJUNJO6zD5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3tv7uYwYKCJUNJO6zD5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1676607159; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zOMC30Dt5jTWJdMwqPugeizmTtOBtAcC7CRKjtqAf8M=; b=GMWPxb/Up38H3wlou/sNwO8IvSkn2R9RoClmjVuqSvCtqGIZumkPNlFA5DfrglH0LMPsKn HaJVOJh3gFNglxJN8q6VEYUB+MfG+druknJ4Gj4KpLXkCpH5DoLuZiWkAr5t5YcXEywESF OKCZnLZLhykGzg4RGp6/7X36rZR+Wqs= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bm43kjVq; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of 3tv7uYwYKCJUNJO6zD5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3tv7uYwYKCJUNJO6zD5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--yuzhao.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1676607159; a=rsa-sha256; cv=none; b=h6rCklNPcceVo4T02nfg9BOBng03wSAorlG6VXBMMnISovsggzAyg/fOV5O26QPlpcnfOU /MrI0aRX3MR/Wt54W3xIC7jIbbPDIPOPiCW69dEE8dYn72pnhD6KqiIRuaQSV2t7vzEV0L fKbpIFztz+d39NsNHL522G+NZEXJwlU= Received: by mail-yb1-f202.google.com with SMTP id 79-20020a250b52000000b009419f64f6afso4297564ybl.2 for ; Thu, 16 Feb 2023 20:12:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zOMC30Dt5jTWJdMwqPugeizmTtOBtAcC7CRKjtqAf8M=; b=bm43kjVqsx1c7qHOsFFVJ0+0xrXUHmX/V9lp20jfJ1X7N4zZd4d3vyNBymGX/me9gK nTebRJhW2oIy2yjncVC9wMVFjstG9iWFcB6e1y3NE5fXCSioxKMMujXyRRWA09QDaWuI 5NHp8SGOOrBADfd/tJIWc3jugNV6pCNLo8yDtwEDkvM75+pJzBnfO9BcBe5GRXiC6M6C BsxIBG/O68HSI1OC0bY7F+eEOjy5Z383u8SWnJZPB3cOWNgoqbleGxYTNZYvv1mrxews ibrqxX3GI/du8KUrofan0Xz4K1l4wKn3rqZtCcgiQArjBpayUlYXpRK7dgISyESZns/1 kB0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zOMC30Dt5jTWJdMwqPugeizmTtOBtAcC7CRKjtqAf8M=; b=DFIuk659OAvkrVMV575VHY1SeDSGUV3o9EvOKMlm+34HB5l8JUqynOCjtuPflHicbB Irm2+9BrvX1PjSJR8/bW0+EK5+Y3nS7rPpZdZH3cSQv7Kpnq0wIC1m3337nU2P/bhpoD asIV8BGGjhtZI0ueBis9jv7DrnXATBjfoFA/rz0LlPoaFwHBZ4RgYA/eD25QAEHATiE+ Ywq8C6lmBmnYV7VlkiyHQnBaRsyp8AXXkEuA3rTgaPkTm0Sks6iz/ztnPa6qu/W+im0t 532WFxv0/BfR370cQf3CxAhCBdsY6LLe+E0zM4XD5eJchkTlL026Xj6IrGGNezya9OFA CQLQ== X-Gm-Message-State: AO0yUKUWWsiDOdPgBaUzEtA5LS70hp7EGTIsIVLgZq+PWHYz2SITHFu8 rDnNR77Nf4Mx7fG48MKVn8u53DSM+Gk= X-Google-Smtp-Source: AK7set9YiUb7IHS5x9zvQDFrhuacJgLt4KpkKYJiB428hgjDG7WBXSnP1azODlrvhkhiovQ/6TgBQWA2SqA= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:6fb3:61e:d31f:1ad3]) (user=yuzhao job=sendgmr) by 2002:a5b:786:0:b0:92c:23ba:7adb with SMTP id b6-20020a5b0786000000b0092c23ba7adbmr836225ybq.545.1676607158852; Thu, 16 Feb 2023 20:12:38 -0800 (PST) Date: Thu, 16 Feb 2023 21:12:27 -0700 In-Reply-To: <20230217041230.2417228-1-yuzhao@google.com> Message-Id: <20230217041230.2417228-3-yuzhao@google.com> Mime-Version: 1.0 References: <20230217041230.2417228-1-yuzhao@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Subject: [PATCH mm-unstable v1 2/5] kvm/x86: add kvm_arch_test_clear_young() From: Yu Zhao To: Andrew Morton , Paolo Bonzini Cc: Jonathan Corbet , Michael Larabel , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-mm@google.com, Yu Zhao X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: ABD31100010 X-Stat-Signature: i79sgsfiios9hs7d69gycb7cbs63b87i X-HE-Tag: 1676607159-17345 X-HE-Meta: U2FsdGVkX18oshN+TXZL7FOu+VDfxB63tig9XxMaMwoLnSy0/slSSYFCC1YBha0+l7I1MQe24yaqguExeYBJkgSdawkHPwDI/k0plfqOB/cFXVhG5NwP4B64SQ0lZEkyKknqn/ZeHFloJQIzqXdvscHisFVOmqZoi83OR/ecrSRHm4gVC9xBBeDp9nwSTkoIIoAD67TpbynTptCXLMVHEAqzv2+QjX9jK6xcI0Ol4CVs9xaGb2B7G31xaWjzmlT/Gn3aO3YH8DOM3/FZZKzC2XVr8qLFdFpjiqmxnKuA+9d7/ZiZrCdfRWWA+AtDKrZv8Xr6OjTi7omYmTDx4b/VTxLSfeUjuEoO1lytiED3mlXbiYQ29AEDwuc+podv2L5oTM702ZdMuxKkWBWSukG11iEgNYtRLstNJGvkRX/iu65ppVTNN5FVspaBvL0CdFlpufk4eBBddmIlK381vgaf0KRITZcyIPmnHC41HEnzClBhaH8z/kAewqO/8cUW2Mb8eyMuO3gAdilM2J10yAMSb5QNhaWtIDT5hjQU36HPL4YDV6pAuujbUqCwm9qS3xUj7sPG24WbX7+XPllSMHftUzkCFEpLkkBit77Z/lm+kwKkLLoDG9Nt/Mas+rkhAw+/k00CUFGsiVDLw0QxWObOzwS0jKwiEIBSKIuTDzxAB3q0I36EbGDWYZDPbu3RfJS/O1dl8xMsq9chHruNNmz3aCtGC9CXdD37PA202ZMcA7yhh9gNCGs/t1XcaVu5GXXyaHld1JuhL4946rVnTP1u3mTmYUCv4vq3yutLHqmogvNxiD1/zOT/xMmyLtKt5Y6V4BUTeHXgy5kKXRdoL/Y3AMteIDokkQtf1//CU50XkVLNrEMZC4J+I13/cxbBGJBKQ852LXqWSmXhu2wAi477VcMPo3sypEx/tWDz6oTpLMOWrOcQqzc6AEjTngRBaTO7QOSSyUS6frgJXpDSZTL PywbWYC2 n/izCXMlV4OCRl07H6cqh0XZ2sQGD17W/KLTXD/x6nETGaP+OTanr43RkEE0Tl6kBMRTP1eqfIJ+BOzLqKwcPGJF3xixdltJerbE5WQrzzKzEutY0WW/Q+GcNvz8lLsqevqF9gY+EphIYWLe8hvmywlWQKCOirjJyzL3f/cr8RrjQ8XTWRmt/pXxnJpBBs3IFWbmJHYaDcbsKSjH9eVDVuDXROB2IGOxZfnrQ8Lgs9Ns4QuyGasUuVu0xB5fmMm53l+lyPDSgAeNj14M6e05d/hkxiUbbSxMDzy//J+y5NPnXHgKcqEhSN2wmiETbVQBhlYIy5gqTqWRQnEu1jKz13myzku+5eotUukpUrOfZ8/XqlxhClEaFytEptD0KSuvGU32rvqRH3CyDpzeOJFeAeacSrXAdDIiqWg1YRN+21aPQeg492WlnJzGvA0j8ICGqrgAzGns0bHY1yf1gGS8//mU5OlQ4K+83rfsK3EBVLG6QIBIBnCJlA/S7e6BoFx1qMv8nUEkHyxdHpXlUqRe5Tq19b6FqeZKBvyYD5u+Uul+H9ulzmuH3jThp26E2MPXvv9IA3P1IswHcbYs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch adds kvm_arch_test_clear_young() for the vast majority of VMs that are not nested and run on hardware that sets the accessed bit in TDP MMU page tables. It relies on two techniques, RCU and cmpxchg, to safely test and clear the accessed bit without taking the MMU lock. The former protects KVM page tables from being freed while the latter clears the accessed bit atomically against both the hardware and other software page table walkers. Signed-off-by: Yu Zhao --- arch/x86/include/asm/kvm_host.h | 27 ++++++++++++++++++++++ arch/x86/kvm/mmu/spte.h | 12 ---------- arch/x86/kvm/mmu/tdp_mmu.c | 41 +++++++++++++++++++++++++++++++++ 3 files changed, 68 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6aaae18f1854..d2995c9e8f07 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1367,6 +1367,12 @@ struct kvm_arch { * the MMU lock in read mode + the tdp_mmu_pages_lock or * the MMU lock in write mode * + * kvm_arch_test_clear_young() is a special case. It relies on two + * techniques, RCU and cmpxchg, to safely test and clear the accessed + * bit without taking the MMU lock. The former protects KVM page tables + * from being freed while the latter clears the accessed bit atomically + * against both the hardware and other software page table walkers. + * * Roots will remain in the list until their tdp_mmu_root_count * drops to zero, at which point the thread that decremented the * count to zero should removed the root from the list and clean @@ -2171,4 +2177,25 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS) +extern u64 __read_mostly shadow_accessed_mask; + +/* + * Returns true if A/D bits are supported in hardware and are enabled by KVM. + * When enabled, KVM uses A/D bits for all non-nested MMUs. Because L1 can + * disable A/D bits in EPTP12, SP and SPTE variants are needed to handle the + * scenario where KVM is using A/D bits for L1, but not L2. + */ +static inline bool kvm_ad_enabled(void) +{ + return shadow_accessed_mask; +} + +/* see the comments on the generic kvm_arch_has_test_clear_young() */ +#define kvm_arch_has_test_clear_young kvm_arch_has_test_clear_young +static inline bool kvm_arch_has_test_clear_young(void) +{ + return IS_ENABLED(CONFIG_KVM) && IS_ENABLED(CONFIG_X86_64) && + (!IS_REACHABLE(CONFIG_KVM) || (kvm_ad_enabled() && tdp_enabled)); +} + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 6f54dc9409c9..0dc7fed1f3fd 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -153,7 +153,6 @@ extern u64 __read_mostly shadow_mmu_writable_mask; extern u64 __read_mostly shadow_nx_mask; extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ extern u64 __read_mostly shadow_user_mask; -extern u64 __read_mostly shadow_accessed_mask; extern u64 __read_mostly shadow_dirty_mask; extern u64 __read_mostly shadow_mmio_value; extern u64 __read_mostly shadow_mmio_mask; @@ -247,17 +246,6 @@ static inline bool is_shadow_present_pte(u64 pte) return !!(pte & SPTE_MMU_PRESENT_MASK); } -/* - * Returns true if A/D bits are supported in hardware and are enabled by KVM. - * When enabled, KVM uses A/D bits for all non-nested MMUs. Because L1 can - * disable A/D bits in EPTP12, SP and SPTE variants are needed to handle the - * scenario where KVM is using A/D bits for L1, but not L2. - */ -static inline bool kvm_ad_enabled(void) -{ - return !!shadow_accessed_mask; -} - static inline bool sp_ad_disabled(struct kvm_mmu_page *sp) { return sp->role.ad_disabled; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index d6df38d371a0..9028e09f1aab 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1309,6 +1309,47 @@ bool kvm_tdp_mmu_age_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) return kvm_tdp_mmu_handle_gfn(kvm, range, age_gfn_range); } +bool kvm_arch_test_clear_young(struct kvm *kvm, struct kvm_gfn_range *range, + gfn_t lsb_gfn, unsigned long *bitmap) +{ + struct kvm_mmu_page *root; + + if (WARN_ON_ONCE(!kvm_arch_has_test_clear_young())) + return false; + + if (kvm_memslots_have_rmaps(kvm)) + return false; + + /* see the comments on kvm_arch->tdp_mmu_roots */ + rcu_read_lock(); + + list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) { + struct tdp_iter iter; + + if (kvm_mmu_page_as_id(root) != range->slot->as_id) + continue; + + tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) { + u64 *sptep = rcu_dereference(iter.sptep); + u64 new_spte = iter.old_spte & ~shadow_accessed_mask; + + VM_WARN_ON_ONCE(!page_count(virt_to_page(sptep))); + VM_WARN_ON_ONCE(iter.gfn < range->start || iter.gfn >= range->end); + + if (new_spte == iter.old_spte) + continue; + + /* see the comments on the generic kvm_arch_has_test_clear_young() */ + if (__test_and_change_bit(lsb_gfn - iter.gfn, bitmap)) + cmpxchg64(sptep, iter.old_spte, new_spte); + } + } + + rcu_read_unlock(); + + return true; +} + static bool test_age_gfn(struct kvm *kvm, struct tdp_iter *iter, struct kvm_gfn_range *range) {