From patchwork Wed May 15 00:59:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664493 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6824663B8; Wed, 15 May 2024 01:00:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734805; cv=none; b=YUNHGD8og+tcEnKzKXC09vufLMdR2N8SQOOTWjUmISSkZBhjU21Y9n5ybHCjtdZY/VkyOAyZnBXfpbuStXIGnPMgFWWeVQTx4wwdJpvHYOVMUlfPCyOmesIVrI2aHX3oHAY1nDC3f8ukD/EjM+QrAYG4bWf6dZCqULwviocGFws= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734805; c=relaxed/simple; bh=NOIn7RE8N6eSPebGTNu70aoSxUhtM1LX0A/8k2eeXGI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XssR43L1yukfVvFGUNwzjTaSEAD+yaxx5cBR52AegL1q1jsTtT+s5h21LVa6mdaRf/jiMF01Hp0YxuFOyrXeUgsb5xHWYKMmdEGBLfBKJsCigBomQhQy7t3P1IM0S8t504HgmjiGpF6EHRjpoisoEaBEbpygqBN4HYrZvf5njc0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=c2Il3iH8; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="c2Il3iH8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734804; x=1747270804; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NOIn7RE8N6eSPebGTNu70aoSxUhtM1LX0A/8k2eeXGI=; b=c2Il3iH8pSSw6ST5m/y95THhP2I/mhb++tMxqyGDoE3f9sEFjzc14w0D N2axObumd2Cv2AKfeqFNdqQiCAogUfoo0rOHi2jeapr2zzNVfFsRgZU1H lgWZD6rSI1JcELrn0/Y9KQFbd3n6HJ4ftYv3hS1KZqrtYOIsXtPAfr3zB 2v32FJKeuaL6wvL8K1I4nYcOV9d2wBEKvDKqjl7JQhN8myBQsYz1DRllc ary9qf4fD6zu5LnNbagQBMby0T2quuL+A1fDvVOKl1+zxRC1xqyAZolIh nSou3neZvJUIaQ3VPq+FSH+HbBbEx5g7vlNZjwOtzbf3l/61niH4YsDbi A==; X-CSE-ConnectionGUID: 2yVU9zoXTOyvi/TqDbC+pA== X-CSE-MsgGUID: YgB9XT5GRC6W2OGcQppilw== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613923" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613923" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:02 -0700 X-CSE-ConnectionGUID: oX4pbqPATUi5iDkIAyxSTA== X-CSE-MsgGUID: 9m3mAfm4TRGlrKTvwfq0YQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942704" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:01 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 01/16] KVM: x86: Add a VM type define for TDX Date: Tue, 14 May 2024 17:59:37 -0700 Message-Id: <20240515005952.3410568-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a VM type define for TDX. Future changes will need to lay the ground work for TDX support by making some behavior conditional on the VM being a TDX guest. Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - New patch, split from main series --- arch/x86/include/uapi/asm/kvm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 988b5204d636..4dea0cfeee51 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -922,5 +922,6 @@ struct kvm_hyperv_eventfd { #define KVM_X86_SEV_VM 2 #define KVM_X86_SEV_ES_VM 3 #define KVM_X86_SNP_VM 4 +#define KVM_X86_TDX_VM 5 #endif /* _ASM_X86_KVM_H */ From patchwork Wed May 15 00:59:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664495 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABDE263B9; Wed, 15 May 2024 01:00:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734807; cv=none; b=dNuFR22e1vS+VkWiD9NdfDYEW1gHUAFDt/1I04HDu4j9AMJjEmk0fsjPEGQkWOeQ8Z5Uni+hiL3u8ZPKKos+Z8+YB8YCS8zT6lYD4U/6i4c03clJTzJyuqBvHTbOlF/yKulK+x3RUoDBL73qUUzlRJhri6aBNHN1m+ZwPALDYrU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734807; c=relaxed/simple; bh=3IiuZ0F0+HeMw0aL1gEKx9S6ntoFeuhcsgyANUKYq88=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jO9lwsTp9XKpBDw2V2PfhGMDlbxeebulwm3ell8GwXme8PGeSuKSHWXYK9wQprgghKa4wFiJmaOcUcdoed5Tv3+O0Fi5Z1bnBZ0ydnOn0yTNFn2FBb/Vcwzm1aKbirOTo0iJreb8vz+fjl2aNN/ChiHQLQooZH8UeCTSu5RGXbg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dpixqwNZ; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dpixqwNZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734805; x=1747270805; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3IiuZ0F0+HeMw0aL1gEKx9S6ntoFeuhcsgyANUKYq88=; b=dpixqwNZOwgWeE1e8+SQBumiC3H5qyXlAcct7c/InUWYXKR92h5o4YGR BZaS5Uj+XTr/Jw16j3A9BDEbzJlfIejzhiJ96x0J3otek/rORv6WeLj2m N+AAxnFXT/pMQGgvisiEYYX5YR+tc3IcFH8a3jnN7UE9DWkexto6Q8a8J lSh+6wjwdHLeyP3NySQFcOjR+R3eQ/z2e2VTDqsw0W/IH/xBKhu78C1Bf mbsv0ufBcwM39Ba5qxrhYg5qtx7/IMHHIUqxLVZKom5c2768Z/BGaoJE+ pb0WPCvdR++ZuCHrJs9EuS1hQMW1V6SuWrRvvM6+3Citg5/M7k/PCxERh g==; X-CSE-ConnectionGUID: Z5zfnomhSu+PRRz+z4btOA== X-CSE-MsgGUID: R91iQ/HDQwmRXtXzKs9tLg== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613930" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613930" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:03 -0700 X-CSE-ConnectionGUID: 7pxJ/QTLScKTUtePeSpGwA== X-CSE-MsgGUID: 9f8f/o5sRSW3s4KeCvQ4nQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942711" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:01 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion Date: Tue, 14 May 2024 17:59:38 -0700 Message-Id: <20240515005952.3410568-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Yan Zhao Introduce a per-memslot flag KVM_MEM_ZAP_LEAFS_ONLY to permit zap only leaf SPTEs when deleting a memslot. Today "zapping only memslot leaf SPTEs" on memslot deletion is not done. Instead KVM will invalidate all old TDPs (i.e. EPT for Intel or NPT for AMD) and generate fresh new TDPs based on the new memslot layout. This is because zapping and re-generating TDPs is low overhead for most use cases, and more importantly, it's due to a bug [1] which caused VM instability when a VM is with Nvidia Geforce GPU assigned. There's a previous attempt [2] to introduce a per-VM flag to workaround bug [1] by only allowing "zapping only memslot leaf SPTEs" for specific VMs. However, [2] was not merged due to lacking of a clear explanation of exactly what is broken [3] and it's not wise to "have a bug that is known to happen when you enable the capability". However, for some specific scenarios, e.g. TDX, invalidating and re-generating a new page table is not viable for reasons: - TDX requires root page of private page table remains unaltered throughout the TD life cycle. - TDX mandates that leaf entries in private page table must be zapped prior to non-leaf entries. So, Sean re-considered about introducing a per-VM flag or per-memslot flag again for VMs like TDX. [4] This patch is an implementation of per-memslot flag. Compared to per-VM flag approach, Pros: (1) By allowing userspace to control the zapping behavior in fine-grained granularity, optimizations for specific use cases can be developed without future kernel changes. (2) Allows developing new zapping behaviors without risking regressions by changing KVM behavior, as seen previously. Cons: (1) Users need to ensure all necessary memslots are with flag KVM_MEM_ZAP_LEAFS_ONLY set.e.g. QEMU needs to ensure all GUEST_MEMFD memslot is with ZAP_LEAFS_ONLY flag for TDX VM. (2) Opens up the possibility that userspace could configure memslots for normal VM in such a way that the bug [1] is seen. However, one thing deserves noting for TDX, is that TDX may potentially meet bug [1] for either per-memslot flag or per-VM flag approach, since there's a usage in radar to assign an untrusted & passthrough GPU device in TDX. If that happens, it can be treated as a bug (not regression) and fixed accordingly. An alternative approach we can also consider is to always invalidate & rebuild all shared page tables and zap only memslot leaf SPTEs for mirrored and private page tables on memslot deletion. This approach could exempt TDX from bug [1] when "untrusted & passthrough" devices are involved. But downside is that this approach requires creating new very specific KVM zapping ABI that could limit future changes in the same way that the bug did for normal VMs. Link: https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com [1] Link: https://lore.kernel.org/kvm/20200713190649.GE29725@linux.intel.com/T/#mabc0119583dacf621025e9d873c85f4fbaa66d5c [2] Link: https://lore.kernel.org/kvm/20200713190649.GE29725@linux.intel.com/T/#m1839c85392a7a022df9e507876bb241c022c4f06 [3] Link: https://lore.kernel.org/kvm/ZhSYEVCHqSOpVKMh@google.com [4] Signed-off-by: Yan Zhao Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - New patch --- arch/x86/kvm/mmu/mmu.c | 30 +++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 17 +++++++++++++++++ include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 5 ++++- 4 files changed, 51 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 61982da8c8b2..4a8e819794db 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6962,10 +6962,38 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm) kvm_mmu_zap_all(kvm); } +static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + if (KVM_BUG_ON(!tdp_mmu_enabled, kvm)) + return; + + write_lock(&kvm->mmu_lock); + + /* + * Zapping non-leaf SPTEs, a.k.a. not-last SPTEs, isn't required, worst + * case scenario we'll have unused shadow pages lying around until they + * are recycled due to age or when the VM is destroyed. + */ + struct kvm_gfn_range range = { + .slot = slot, + .start = slot->base_gfn, + .end = slot->base_gfn + slot->npages, + .may_block = true, + }; + + if (kvm_tdp_mmu_unmap_gfn_range(kvm, &range, false)) + kvm_flush_remote_tlbs(kvm); + + write_unlock(&kvm->mmu_lock); +} + void kvm_arch_flush_shadow_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { - kvm_mmu_zap_all_fast(kvm); + if (slot->flags & KVM_MEM_ZAP_LEAFS_ONLY) + kvm_mmu_zap_memslot_leafs(kvm, slot); + else + kvm_mmu_zap_all_fast(kvm); } void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7c593a081eba..4b3ec2ec79e9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12952,6 +12952,23 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn()) return -EINVAL; + /* + * Since TDX private pages requires re-accepting after zap, + * and TDX private root page should not be zapped, TDX requires + * memslots for private memory must have flag + * KVM_MEM_ZAP_LEAFS_ONLY set too, so that only leaf SPTEs of + * the deleting memslot will be zapped and SPTEs in other + * memslots would not be affected. + */ + if (kvm->arch.vm_type == KVM_X86_TDX_VM && + (new->flags & KVM_MEM_GUEST_MEMFD) && + !(new->flags & KVM_MEM_ZAP_LEAFS_ONLY)) + return -EINVAL; + + /* zap-leafs-only works only when TDP MMU is enabled for now */ + if ((new->flags & KVM_MEM_ZAP_LEAFS_ONLY) && !tdp_mmu_enabled) + return -EINVAL; + return kvm_alloc_memslot_metadata(kvm, new); } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index aee67912e71c..d53648c19b26 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -51,6 +51,7 @@ struct kvm_userspace_memory_region2 { #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) #define KVM_MEM_GUEST_MEMFD (1UL << 2) +#define KVM_MEM_ZAP_LEAFS_ONLY (1UL << 3) /* for KVM_IRQ_LINE */ struct kvm_irq_level { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 81b90bf03f2f..1b1ffb6fc786 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1568,6 +1568,8 @@ static int check_memory_region_flags(struct kvm *kvm, if (kvm_arch_has_private_mem(kvm)) valid_flags |= KVM_MEM_GUEST_MEMFD; + valid_flags |= KVM_MEM_ZAP_LEAFS_ONLY; + /* Dirty logging private memory is not currently supported. */ if (mem->flags & KVM_MEM_GUEST_MEMFD) valid_flags &= ~KVM_MEM_LOG_DIRTY_PAGES; @@ -2052,7 +2054,8 @@ int __kvm_set_memory_region(struct kvm *kvm, return -EINVAL; if ((mem->userspace_addr != old->userspace_addr) || (npages != old->npages) || - ((mem->flags ^ old->flags) & KVM_MEM_READONLY)) + ((mem->flags ^ old->flags) & + (KVM_MEM_READONLY | KVM_MEM_ZAP_LEAFS_ONLY))) return -EINVAL; if (base_gfn != old->base_gfn) From patchwork Wed May 15 00:59:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664494 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBD6963D5; Wed, 15 May 2024 01:00:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734806; cv=none; b=XIMFTvwmJmrwo5E9thpGqs9Lx7cEOD5AA5jyyaqSeOvSCJTM7sjBkO1NZ1wp3Ozmpes4eIassZifuG/+XnygzGBMBxsUv4GFKOn8TdNiupLVd3+4BCdB9M6SLBul2wKHZ5y/AarWhMnfjcuFsGUBRLnqnVLY3uuDW7OUnteCeRI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734806; c=relaxed/simple; bh=v6Cuz4zx2wjPGlGh14VUElcDcI+9AmfYj5Tp5b9rlU8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Gu3eYArLO+EW/j+0jPlOvUnbdvCL82fZ4FwtPw4LxJMMZ4tbNOgIlSmIFJyTODhcICo+nF2bi2nbPrX3EmtZqnuAPZ0SZJCSKiIk29jAxDTz7YiUDZ91YaD6PmwNfy8vSm4i/ZYJtMxJuNEvISUkGN21mtv2AZ5fZBkecagAvdw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HdA5UoLg; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HdA5UoLg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734805; x=1747270805; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=v6Cuz4zx2wjPGlGh14VUElcDcI+9AmfYj5Tp5b9rlU8=; b=HdA5UoLgQMGvRLAevm/KcR0Utgh3n3L+8mPLcEu3ge/7dke4D2TvxjhN 7hCdOHVCAHO8bjtEPNRy7yyO4ewk2rZ8ukY8bZL4qMFir7jj5ly56yAeF h1dDeW6XI/Xa6UP1liP2xGATGJJxLKfLZVkCHxU8cPBTfOecBt2S3TE9U J+aeuvC8uYeEh5PiBfCZsFYka7UmN/cHHVaaK32hzyLQBI1anrfokvnFQ gejJAbSB6FNqdrclE+xWTSOvlCQ/CphcyNEvLiO8IBSN1wPHwulLBGjGt XgnbuRxYLbI0uMljrh053N0UFozJifsgn9wA9kIlNh/WqRi4iXp2R6GE+ A==; X-CSE-ConnectionGUID: p5VVk2WYSwGHh7zTb+AW6g== X-CSE-MsgGUID: BKKy142LTn2ESNJbs6qfxQ== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613937" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613937" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:03 -0700 X-CSE-ConnectionGUID: KkRdh+qpQ4GfL6f/kt9Bug== X-CSE-MsgGUID: TU/ohH1tRra8vedU9iOw9Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942719" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:02 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 03/16] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Date: Tue, 14 May 2024 17:59:39 -0700 Message-Id: <20240515005952.3410568-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Export a function to walk down the TDP without modifying it. Future changes will support pre-populating TDX private memory. In order to implement this KVM will need to check if a given GFN is already pre-populated in the mirrored EPT, and verify the populated private memory PFN matches the current one.[1] There is already a TDP MMU walker, kvm_tdp_mmu_get_walk() for use within the KVM MMU that almost does what is required. However, to make sense of the results, MMU internal PTE helpers are needed. Refactor the code to provide a helper that can be used outside of the KVM MMU code. Refactoring the KVM page fault handler to support this lookup usage was also considered, but it was an awkward fit. Link: https://lore.kernel.org/kvm/ZfBkle1eZFfjPI8l@google.com/ [1] Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- This helper will be used in the future change that implements KVM_TDX_INIT_MEM_REGION. Please refer to the following commit for the usage: https://github.com/intel/tdx/commit/2832c6d87a4e6a46828b193173550e80b31240d4 TDX MMU Part 1: - New patch --- arch/x86/kvm/mmu.h | 3 +++ arch/x86/kvm/mmu/tdp_mmu.c | 37 +++++++++++++++++++++++++++++++++---- 2 files changed, 36 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index dc80e72e4848..3c7a88400cbb 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -275,6 +275,9 @@ extern bool tdp_mmu_enabled; #define tdp_mmu_enabled false #endif +int kvm_tdp_mmu_get_walk_private_pfn(struct kvm_vcpu *vcpu, u64 gpa, + kvm_pfn_t *pfn); + static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { return !tdp_mmu_enabled || kvm_shadow_root_allocated(kvm); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1259dd63defc..1086e3b2aa5c 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1772,16 +1772,14 @@ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, * * Must be called between kvm_tdp_mmu_walk_lockless_{begin,end}. */ -int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, - int *root_level) +static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + bool is_private) { struct tdp_iter iter; struct kvm_mmu *mmu = vcpu->arch.mmu; gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; - *root_level = vcpu->arch.mmu->root_role.level; - tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { leaf = iter.level; sptes[leaf] = iter.old_spte; @@ -1790,6 +1788,37 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, return leaf; } +int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, + int *root_level) +{ + *root_level = vcpu->arch.mmu->root_role.level; + + return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, false); +} + +int kvm_tdp_mmu_get_walk_private_pfn(struct kvm_vcpu *vcpu, u64 gpa, + kvm_pfn_t *pfn) +{ + u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte; + int leaf; + + lockdep_assert_held(&vcpu->kvm->mmu_lock); + + rcu_read_lock(); + leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, true); + rcu_read_unlock(); + if (leaf < 0) + return -ENOENT; + + spte = sptes[leaf]; + if (!(is_shadow_present_pte(spte) && is_last_spte(spte, leaf))) + return -ENOENT; + + *pfn = spte_to_pfn(spte); + return leaf; +} +EXPORT_SYMBOL_GPL(kvm_tdp_mmu_get_walk_private_pfn); + /* * Returns the last level spte pointer of the shadow page walk for the given * gpa, and sets *spte to the spte value. This spte may be non-preset. If no From patchwork Wed May 15 00:59:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664496 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EBEB79C8; Wed, 15 May 2024 01:00:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734808; cv=none; b=DchnOauOSx5Oj/40XJ8HH6pC1+BECVDdfk9evILnk2WG91qIfJuTA+SOeaOP20iWsedf9YmwwwRLaHzeM1XSktdZZOSKH2tMffclelXsBLfBqnQLvdmTjsqG1UAUIBepmyynb1oHfHs2vfjaSUbveElaBYzrVYMyPt38R8AYSZA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734808; c=relaxed/simple; bh=EiPyaz12Hs/yvu2vb2xAWA/RYdi5EWxAwUrWQYyN7dw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=omXYjLZAJZNkVKgFuc27xWr/qpePJh5RuQC+A5VCpKBUiyTLGGRf0aBAWJoPmkXx9JJ10zEJ7V6TZ8sI6AJ0Hz3mSnbg1rwsedW+zT1UcUMN6CVBh0kzfo0CgyzkAdcS8s1Ae1mDg4JP04fr8NeNEWxTMviU3tpEed3kwtOa+Jg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ELClGgbz; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ELClGgbz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734805; x=1747270805; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EiPyaz12Hs/yvu2vb2xAWA/RYdi5EWxAwUrWQYyN7dw=; b=ELClGgbzEWPPrOQH8LYy5ogKNpc3jx+tSlMjye+JiqFExsvgHzzG5Ct9 qitUmcVoDVhkpetFmKXVQPz5vbo3K958kRmRMhGVT03kWxl2yOMXpRV1l 8LbUHuVOAbukSalUjQQG+pKGdz0qOk53LdC849VmgH4RwWTxnPM+c11ou m9S6XowJ3sIy4Kexg/CBE4bx12e/9BNqlro6U4IbhtA6j4mgHkZFGQ+PL wHHZ5P2WBZXY+5wXa8IAPV48xnOhAOvnlndsOA4s65ZWz6dVJMGh0TysD dVgIKg510yGYw1RLTnUyDBprb/sYn7JLzwlRuHSPAPF4+93uxjxdsxfBv A==; X-CSE-ConnectionGUID: JscFULhURNOu5LsnGHxd2A== X-CSE-MsgGUID: KLCznybqRSyZBVQ8OZh0XA== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613944" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613944" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:04 -0700 X-CSE-ConnectionGUID: 59cLFGsYRK+7uA0a878tEw== X-CSE-MsgGUID: G3nbzKu1SHaBw238yzeYqA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942724" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:02 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 04/16] KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA Date: Tue, 14 May 2024 17:59:40 -0700 Message-Id: <20240515005952.3410568-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Introduce a "gfn_shared_mask" field in the kvm_arch structure to record GPA shared bit and provide address conversion helpers for TDX shared bit of GPA. TDX designates a specific GPA bit as the shared bit, which can be either bit 51 or bit 47 based on configuration. This GPA shared bit indicates whether the corresponding physical page is shared (if shared bit set) or private (if shared bit cleared). - GPAs with shared bit set will be mapped by VMM into conventional EPT, which is pointed by shared EPTP in TDVMCS, resides in host VMM memory and is managed by VMM. - GPAs with shared bit cleared will be mapped by VMM firstly into a mirrored EPT, which resides in host VMM memory. Changes of the mirrored EPT are then propagated into a private EPT, which resides outside of host VMM memory and is managed by TDX module. Add the "gfn_shared_mask" field to the kvm_arch structure for each VM with a default value of 0. It will be set to the position of the GPA shared bit in GFN through TD specific initialization code. Provide helpers to utilize the gfn_shared_mask to determine whether a GPA is shared or private, retrieve the GPA shared bit value, and insert/strip shared bit to/from a GPA. Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu --- TDX MMU Part 1: - Update commit log (Yan) - Fix documentation on kvm_is_private_gpa() (Binbin) v19: - Add comment on default vm case. - Added behavior table in the commit message - drop CONFIG_KVM_MMU_PRIVATE v18: - Added Reviewed-by Binbin --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu.h | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index aabf1648a56a..d2f924f1d579 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1519,6 +1519,8 @@ struct kvm_arch { */ #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1) struct kvm_mmu_memory_cache split_desc_cache; + + gfn_t gfn_shared_mask; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 3c7a88400cbb..dac13a2d944f 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -321,4 +321,37 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu, return gpa; return translate_nested_gpa(vcpu, gpa, access, exception); } + +/* + * default or SEV-SNP TDX: where S = (47 or 51) - 12 + * gfn_shared_mask 0 S bit + * is_private_gpa() always false true if GPA has S bit clear + * gfn_to_shared() nop set S bit + * gfn_to_private() nop clear S bit + * + * fault.is_private means that host page should be gotten from guest_memfd + * is_private_gpa() means that KVM MMU should invoke private MMU hooks. + */ +static inline gfn_t kvm_gfn_shared_mask(const struct kvm *kvm) +{ + return kvm->arch.gfn_shared_mask; +} + +static inline gfn_t kvm_gfn_to_shared(const struct kvm *kvm, gfn_t gfn) +{ + return gfn | kvm_gfn_shared_mask(kvm); +} + +static inline gfn_t kvm_gfn_to_private(const struct kvm *kvm, gfn_t gfn) +{ + return gfn & ~kvm_gfn_shared_mask(kvm); +} + +static inline bool kvm_is_private_gpa(const struct kvm *kvm, gpa_t gpa) +{ + gfn_t mask = kvm_gfn_shared_mask(kvm); + + return mask && !(gpa_to_gfn(gpa) & mask); +} + #endif From patchwork Wed May 15 00:59:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664497 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3EF7EEBF; Wed, 15 May 2024 01:00:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734808; cv=none; b=h+AjV5FpgyWvCXe/4ch+NAI6qC6lVWYbUlk4LV0nZWNv25SUNxnCZiMF/szIJ3np5GVgiU9aPxmMudrYvuN3H1Nr3+93pYobyv8GGTFB9KLMQDnReP4t+aggy4eSK1So6HmDWndSy7AvxjWoE2cFSq9xK3mQd+ub03/dh1PxiTk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734808; c=relaxed/simple; bh=SwveWfaPqRHgaQcN/Q/O3ad7FH2+3nwsm35+tH34STs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OWroOX0uro7G2TmNrweZdoZeLrmGssYrE0vxxk7CO89YdCj/BA2X0Rz7F34fPbDmHxcmkGs50AmIfkspSpDCxp2ZpYlYUtJXL1y71dcPkiRTkT7/+OPRKgkTGVlc9oYpFtdEdO0HVZ6INYGmggMBnP8a9schdC6DLrvrZGutVD8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=eAJjHG3z; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eAJjHG3z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734807; x=1747270807; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SwveWfaPqRHgaQcN/Q/O3ad7FH2+3nwsm35+tH34STs=; b=eAJjHG3zUUYMDJsFvETIskXE7lYy47dZqZ96lp2re5rvkhOPUXml/ovy rWc/9japptXM7Ufkoeamtux3fNE3yJDmQ025T0Bg4OTvYnMFj1c7kVK4k myZeISle/7eyj8zZLM3PsU14hpSqO7mKZkGO3UoMLkQFl4T/YkDdIqFHn k3w3XAm8S3P2WiWe89eddWkouldRctKtvPVc/KsM4qq1I1ot8S86IZWno I4X7oAqrNDkckTnidWGkxMs/O3jiz5ojUD63AP2j714YIAJl1sXEXE8aN 61WAEq6m4+n8IQw56Rnq6bHP8bqVKmphFnFKmuaf42i2T/hENogNI/PEu w==; X-CSE-ConnectionGUID: fI0JRhpNQquzviuX5xtk2Q== X-CSE-MsgGUID: 0nSd47Y1RCiLlmpr7JWJbA== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613950" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613950" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:04 -0700 X-CSE-ConnectionGUID: 6R0YLLi+Q1+DSOYzLITwJQ== X-CSE-MsgGUID: 98kC1KSSTjW6O21Op7AO9g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942735" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:03 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 05/16] KVM: Add member to struct kvm_gfn_range for target alias Date: Tue, 14 May 2024 17:59:41 -0700 Message-Id: <20240515005952.3410568-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add new members to strut kvm_gfn_range to indicate which mapping (private-vs-shared) to operate on: enum kvm_process process. Update the core zapping operations to set them appropriately. TDX utilizes two GPA aliases for the same memslots, one for memory that is for private memory and one that is for shared. For private memory, KVM cannot always perform the same operations it does on memory for default VMs, such as zapping pages and having them be faulted back in, as this requires guest coordination. However, some operations such as guest driven conversion of memory between private and shared should zap private memory. Internally to the MMU, private and shared mappings are tracked on separate roots. Mapping and zapping operations will operate on the respective GFN alias for each root (private or shared). So zapping operations will by default zap both aliases. Add fields in struct kvm_gfn_range to allow callers to specify which aliases so they can only target the aliases appropriate for their specific operation. There was feedback that target aliases should be specified such that the default value (0) is to operate on both aliases. Several options were considered. Several variations of having separate bools defined such that the default behavior was to process both aliases. They either allowed nonsensical configurations, or were confusing for the caller. A simple enum was also explored and was close, but was hard to process in the caller. Instead, use an enum with the default value (0) reserved as a disallowed value. Catch ranges that didn't have the target aliases specified by looking for that specific value. Set target alias with enum appropriately for these MMU operations: - For KVM's mmu notifier callbacks, zap shared pages only because private pages won't have a userspace mapping - For setting memory attributes, kvm_arch_pre_set_memory_attributes() chooses the aliases based on the attribute. - For guest_memfd invalidations, zap private only. Link: https://lore.kernel.org/kvm/ZivIF9vjKcuGie3s@google.com/ Signed-off-by: Isaku Yamahata Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - Replaced KVM_PROCESS_BASED_ON_ARG with BUGGY_KVM_INVALIDATION to follow the original suggestion and not populte kvm_handle_gfn_range(). And add WARN_ON_ONCE(). - Move attribute specific logic into kvm_vm_set_mem_attributes() - Drop Sean's suggested-by tag as the solution has changed - Re-write commit log v18: - rebased to kvm-next v3: - Drop the KVM_GFN_RANGE flags - Updated struct kvm_gfn_range - Change kvm_arch_set_memory_attributes() to return bool for flush - Added set_memory_attributes x86 op for vendor backends - Refined commit message to describe TDX care concretely v2: - consolidate KVM_GFN_RANGE_FLAGS_GMEM_{PUNCH_HOLE, RELEASE} into KVM_GFN_RANGE_FLAGS_GMEM. - Update the commit message to describe TDX more. Drop SEV_SNP. --- arch/x86/kvm/mmu/mmu.c | 12 ++++++++++++ include/linux/kvm_host.h | 8 ++++++++ virt/kvm/guest_memfd.c | 2 ++ virt/kvm/kvm_main.c | 14 ++++++++++++++ 4 files changed, 36 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4a8e819794db..1998267a330e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6979,6 +6979,12 @@ static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *s .start = slot->base_gfn, .end = slot->base_gfn + slot->npages, .may_block = true, + + /* + * All private and shared page should be zapped on memslot + * deletion. + */ + .process = KVM_PROCESS_PRIVATE_AND_SHARED, }; if (kvm_tdp_mmu_unmap_gfn_range(kvm, &range, false)) @@ -7479,6 +7485,12 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm))) return false; + /* Unmmap the old attribute page. */ + if (range->arg.attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE) + range->process = KVM_PROCESS_SHARED; + else + range->process = KVM_PROCESS_PRIVATE; + return kvm_unmap_gfn_range(kvm, range); } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c3c922bf077f..f92c8b605b03 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -260,11 +260,19 @@ union kvm_mmu_notifier_arg { unsigned long attributes; }; +enum kvm_process { + BUGGY_KVM_INVALIDATION = 0, + KVM_PROCESS_SHARED = BIT(0), + KVM_PROCESS_PRIVATE = BIT(1), + KVM_PROCESS_PRIVATE_AND_SHARED = KVM_PROCESS_SHARED | KVM_PROCESS_PRIVATE, +}; + struct kvm_gfn_range { struct kvm_memory_slot *slot; gfn_t start; gfn_t end; union kvm_mmu_notifier_arg arg; + enum kvm_process process; bool may_block; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 9714add38852..e5ff6fde2db3 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -109,6 +109,8 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, .end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff, .slot = slot, .may_block = true, + /* guest memfd is relevant to only private mappings. */ + .process = KVM_PROCESS_PRIVATE, }; if (!found_memslot) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1b1ffb6fc786..cc434c7509f1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -635,6 +635,11 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm, */ gfn_range.arg = range->arg; gfn_range.may_block = range->may_block; + /* + * HVA-based notifications aren't relevant to private + * mappings as they don't have a userspace mapping. + */ + gfn_range.process = KVM_PROCESS_SHARED; /* * {gfn(page) | page intersects with [hva_start, hva_end)} = @@ -2453,6 +2458,14 @@ static __always_inline void kvm_handle_gfn_range(struct kvm *kvm, gfn_range.arg = range->arg; gfn_range.may_block = range->may_block; + /* + * If/when KVM supports more attributes beyond private .vs shared, this + * _could_ set exclude_{private,shared} appropriately if the entire target + * range already has the desired private vs. shared state (it's unclear + * if that is a net win). For now, KVM reaches this point if and only + * if the private flag is being toggled, i.e. all mappings are in play. + */ + for (i = 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { slots = __kvm_memslots(kvm, i); @@ -2509,6 +2522,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end, struct kvm_mmu_notifier_range pre_set_range = { .start = start, .end = end, + .arg.attributes = attributes, .handler = kvm_pre_set_memory_attributes, .on_lock = kvm_mmu_invalidate_begin, .flush_on_ret = true, From patchwork Wed May 15 00:59:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664498 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 203E0107A6; Wed, 15 May 2024 01:00:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734808; cv=none; b=Z54P11ETZ2eLP+QsmavJTNbjPiXDh8jCLL6ZV19V6lZEazCcXZZoo+DpbNmnVhmlFXk7ETslbXSgZcIaajncfYh9CMO5YheOV61oofWvTCkmaRSyvXlIXcijQr5zL8mQ5uUZYO9ycPcI3djP75nhffbPP7IcLDkttmgWgzTWXjc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734808; c=relaxed/simple; bh=3riEeBhx8HzcTBEGj7bVXK60LuEYP5+ldqG1Lza6N1w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=u09Xs/CSPtiFaz3KO1b7K8AB+m7WuPqJv7PsYxRKhKeW4KeGwgoAI3Q7VmapigESVGbraxFkZyojKoAg32gXFoV/cZLK3mVc/hJkpMQ3VTsrbP44/GdQ3zHXoyYPkkxllM4wsolNwdHHd4dFtAWUen3j3mUTzbzDxyd8UH5iTFo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kis8Slub; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kis8Slub" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734807; x=1747270807; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3riEeBhx8HzcTBEGj7bVXK60LuEYP5+ldqG1Lza6N1w=; b=kis8SlubXlqWRPw0CjW3BqFRbt9q9lANPwWHjQHLqXv+zwwi1LeD8cEA bUw/mYu3ki4UdgDngF8adGDq44jxibmZxuTzDesXM3Guvm9sJfNPwhjlR ycpSza3zY6yoiw5w2NydDTFdxlAtaUAypqZskJGmzP4ALPvhB4cDOe+yy nFKVv25BgudvMCW3EKpIIWHzj1/E78bp8Gc19Q3aDT++4fLeMrwtkTij5 BfwvYmsvNjIMh+STOuh6E75Kvejpcgu6UKVK2dWjxNR5aZiAWmdiCakLZ HNefuZ3/x6vcmj85eLqEZ3eQvoD4ikx+57hHE3Q8OTduYLAFZpUt4UWK/ Q==; X-CSE-ConnectionGUID: Q1eCECccTP2qSzmiFe906g== X-CSE-MsgGUID: /g9g7oVwSbCnOhZA/HoWTg== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613955" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613955" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:04 -0700 X-CSE-ConnectionGUID: hLuB4nMQRHKGeaLU1S3wxQ== X-CSE-MsgGUID: jkgl2X01RX2y2ngdP0369g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942747" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:03 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 06/16] KVM: x86/mmu: Add a new is_private member for union kvm_mmu_page_role Date: Tue, 14 May 2024 17:59:42 -0700 Message-Id: <20240515005952.3410568-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Introduce a "is_private" member to the kvm_mmu_page_role union to identify SPTEs associated with the mirrored EPT. The TDX module maintains the private half of the EPT mapped in the TD in its protected memory. KVM keeps a copy of the private GPAs in a mirrored EPT tree within host memory, recording the root page HPA in each vCPU's mmu->private_root_hpa. This "is_private" attribute enables vCPUs to find and get the root page of mirrored EPT from the MMU root list for a guest TD. This also allows KVM MMU code to detect changes in mirrored EPT according to the "is_private" mmu page role and propagate the changes to the private EPT managed by TDX module. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - Remove warning and NULL check in is_private_sptep() (Rick) - Update commit log (Yan) v19: - Fix is_private_sptep() when NULL case. - drop CONFIG_KVM_MMU_PRIVATE --- arch/x86/include/asm/kvm_host.h | 13 ++++++++++++- arch/x86/kvm/mmu/mmu_internal.h | 5 +++++ arch/x86/kvm/mmu/spte.h | 5 +++++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d2f924f1d579..13119d4e44e5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -351,7 +351,8 @@ union kvm_mmu_page_role { unsigned ad_disabled:1; unsigned guest_mode:1; unsigned passthrough:1; - unsigned :5; + unsigned is_private:1; + unsigned :4; /* * This is left at the top of the word so that @@ -363,6 +364,16 @@ union kvm_mmu_page_role { }; }; +static inline bool kvm_mmu_page_role_is_private(union kvm_mmu_page_role role) +{ + return !!role.is_private; +} + +static inline void kvm_mmu_page_role_set_private(union kvm_mmu_page_role *role) +{ + role->is_private = 1; +} + /* * kvm_mmu_extended_role complements kvm_mmu_page_role, tracking properties * relevant to the current MMU configuration. When loading CR0, CR4, or EFER, diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 706f0ce8784c..b114589a595a 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -145,6 +145,11 @@ static inline int kvm_mmu_page_as_id(struct kvm_mmu_page *sp) return kvm_mmu_role_as_id(sp->role); } +static inline bool is_private_sp(const struct kvm_mmu_page *sp) +{ + return kvm_mmu_page_role_is_private(sp->role); +} + static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) { /* diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 5dd5405fa07a..d0df691ced5c 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -265,6 +265,11 @@ static inline struct kvm_mmu_page *root_to_sp(hpa_t root) return spte_to_child_sp(root); } +static inline bool is_private_sptep(u64 *sptep) +{ + return is_private_sp(sptep_to_sp(sptep)); +} + static inline bool is_mmio_spte(struct kvm *kvm, u64 spte) { return (spte & shadow_mmio_mask) == kvm->arch.shadow_mmio_value && From patchwork Wed May 15 00:59:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664501 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD9801EEE3; Wed, 15 May 2024 01:00:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734812; cv=none; b=VHcGs0Vfh7PKv1cekngrzuYVEjPr2qXlk5TeFjQeUY5EaRqNWwpuNfJRZpyAnEIjNkTWgj3PQKBAogqKW4Oyum3UCLdA1L/PllAQCbwaB4gpbjfQLMqMXaSOipJrktgD3icBUx08XuYh4w5M6J5BVr1o2gfUwPIedH4tgGbkL3k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734812; c=relaxed/simple; bh=ZhLlZRSc1N8HWBGh0Hwv4AHosc/nMbevb/CrbuiWxcc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=XkTOpelE2FXTVl+QWHOX+gibbz3I0xXcUeBaa/YIqeVaFl2FA3ahUHhvanrOX+3ml8Os1RlY0klOv5bD6BTfpEYthfrVQQGUdZNSl5D2NvxfB4z6X7G92FA5PyfeRHBSSsNwc8w1H7HGB73llZ/kIK595v9AQth4fdNFb3UeH5o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TnRe3VTF; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TnRe3VTF" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734809; x=1747270809; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZhLlZRSc1N8HWBGh0Hwv4AHosc/nMbevb/CrbuiWxcc=; b=TnRe3VTF/zs7alAOsZK9NV7y9HuQk5Z011VFxNlPpjD4hTcIHRGbK7m5 w1GpO7dJKKFj3HYX1zDL4sUvtUIoh2UAcm0ZjkquZZ4J3P+NhZLwqTe+F roNNP6sp+Y8+B9o7OUkpGMnz9jKeMXU89ZsTeKfSG6JdyxuF+GEN6g3/A 7lkvJ+5E3fpkoFcsntzBK9ytLMmMOR8cIRlQl3j1HleUf9YCYMCucsRdf cnGA2gwjUQvzht5N/RIxRMz1ZBTMtXnEjFweYZtYhnzXFtNEbtuP5pP7a NO7PjZfqtq4eGtAD3dJTXcUW+z4DeuA3rN9SPbF00PvnpZEIOTK2+dxtp w==; X-CSE-ConnectionGUID: 4g2iXPXRR7it1SjX3qdpGg== X-CSE-MsgGUID: 80cxgbCfQP+fU8k+eIMuJQ== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613959" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613959" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:05 -0700 X-CSE-ConnectionGUID: cY4EQ/NfSsqzZ1SkRhSwGg== X-CSE-MsgGUID: Dii6f8PhTQi6/zFUhC+cPQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942754" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:04 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 07/16] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page Date: Tue, 14 May 2024 17:59:43 -0700 Message-Id: <20240515005952.3410568-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add a private pointer to struct kvm_mmu_page for the private page table and add helper functions to allocate/initialize/free a private page table page. Because KVM TDP MMU doesn't use unsync_children and write_flooding_count, pack them to have room for a pointer and use a union to avoid memory overhead. For private GPA, CPU refers to a private page table whose contents are encrypted. The dedicated APIs to operate on it (e.g. updating/reading its PTE entry) are used, and their cost is expensive. When KVM resolves the KVM page fault, it walks the page tables. To reuse the existing KVM MMU code and mitigate the heavy cost of directly walking the private page table, allocate one more page for the mirrored page table for the KVM MMU code to directly walk. Resolve the KVM page fault with the existing code, and do additional operations necessary for the private page table. To distinguish such cases, the existing KVM page table is called a shared page table (i.e., not associated with a private page table), and the page table with a private page table is called a mirrored page table. The relationship is depicted below. KVM page fault | | | V | -------------+---------- | | | | V V | shared GPA private GPA | | | | V V | shared PT root mirrored PT root | private PT root | | | | V V | V shared PT mirrored PT ----propagate----> private PT | | | | | \-----------------+------\ | | | | | V | V V shared guest page | private guest page | non-encrypted memory | encrypted memory | PT: Page table Shared PT: visible to KVM, and the CPU uses it for shared mappings. Private PT: the CPU uses it, but it is invisible to KVM. TDX module updates this table to map private guest pages. Mirrored PT: It is visible to KVM, but the CPU doesn't use it. KVM uses it to propagate PT change to the actual private PT. Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu --- TDX MMU Part 1: - Rename terminology, dummy PT => mirror PT. and updated the commit message By Rick and Kai. - Added a comment on union of private_spt by Rick. - Don't handle the root case in kvm_mmu_alloc_private_spt(), it will not be needed in future patches. (Rick) - Update comments (Yan) - Remove kvm_mmu_init_private_spt(), open code it in later patches (Yan) v19: - typo in the comment in kvm_mmu_alloc_private_spt() - drop CONFIG_KVM_MMU_PRIVATE --- arch/x86/include/asm/kvm_host.h | 5 +++++ arch/x86/kvm/mmu/mmu.c | 7 +++++++ arch/x86/kvm/mmu/mmu_internal.h | 36 +++++++++++++++++++++++++++++---- arch/x86/kvm/mmu/tdp_mmu.c | 1 + 4 files changed, 45 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 13119d4e44e5..d010ca5c7f44 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -828,6 +828,11 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_shadow_page_cache; struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; + /* + * This cache is to allocate private page table. E.g. private EPT used + * by the TDX module. + */ + struct kvm_mmu_memory_cache mmu_private_spt_cache; /* * QEMU userspace and the guest each have their own FPU state. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 1998267a330e..d5cf5b15a10e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -685,6 +685,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect) 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); if (r) return r; + if (kvm_gfn_shared_mask(vcpu->kvm)) { + r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_private_spt_cache, + PT64_ROOT_MAX_LEVEL); + if (r) + return r; + } r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, PT64_ROOT_MAX_LEVEL); if (r) @@ -704,6 +710,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_private_spt_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index b114589a595a..0f1a9d733d9e 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -101,7 +101,22 @@ struct kvm_mmu_page { int root_count; refcount_t tdp_mmu_root_count; }; - unsigned int unsync_children; + union { + /* Those two members aren't used for TDP MMU */ + struct { + unsigned int unsync_children; + /* + * Number of writes since the last time traversal + * visited this page. + */ + atomic_t write_flooding_count; + }; + /* + * Page table page of private PT. + * Passed to TDX module, not accessed by KVM. + */ + void *private_spt; + }; union { struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ tdp_ptep_t ptep; @@ -124,9 +139,6 @@ struct kvm_mmu_page { int clear_spte_count; #endif - /* Number of writes since the last time traversal visited this page. */ - atomic_t write_flooding_count; - #ifdef CONFIG_X86_64 /* Used for freeing the page asynchronously if it is a TDP MMU page. */ struct rcu_head rcu_head; @@ -150,6 +162,22 @@ static inline bool is_private_sp(const struct kvm_mmu_page *sp) return kvm_mmu_page_role_is_private(sp->role); } +static inline void *kvm_mmu_private_spt(struct kvm_mmu_page *sp) +{ + return sp->private_spt; +} + +static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) +{ + /* + * private_spt is allocated for TDX module to hold private EPT mappings, + * TDX module will initialize the page by itself. + * Therefore, KVM does not need to initialize or access private_spt. + * KVM only interacts with sp->spt for mirrored EPT operations. + */ + sp->private_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_private_spt_cache); +} + static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) { /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1086e3b2aa5c..6fa910b017d1 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -53,6 +53,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) { + free_page((unsigned long)sp->private_spt); free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); } From patchwork Wed May 15 00:59:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664499 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2BAB1EB26; Wed, 15 May 2024 01:00:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734810; cv=none; b=kKAiCysAczkc2spuOD3iynWvWjFgYPQUFMb+PDB36FyHQ90jt2tZQV1sb3KpqnKfS2fWlBZ4xNe2B8TW0nY0kTp/8tMtpR7LfO2upXRu1tCfj85zhvw1nNgrSz2HltM0WXjSiLi1VQiDcotfFK7VfrZgHZAU5yXCBFoj3dfl9Vw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734810; c=relaxed/simple; bh=+6tj0kEKZAuUZugRM5HgXHgBCG9pzGLg+BLWoV5ezpM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bOPRQNbGqCOkFEdgsuK712/Vb7sdAfkucoMH8hFOSIXtyj3I9W2a70SVogjyw/cFnSnHBZj7K/1bpJ1gqGv6BO1Ctfiu4hJJW2nm2wdzoDTu9wUUTgKfAZVV6TrloKwIEk6E0nMpR9TmrU612Bb4KLNVlQFEvBs5w01ZJAfyK2c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nOy4ugC1; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nOy4ugC1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734809; x=1747270809; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+6tj0kEKZAuUZugRM5HgXHgBCG9pzGLg+BLWoV5ezpM=; b=nOy4ugC1lSOA3L5juhZaKHEFM0JrooZMkKrefZuLDxx/7bP/NKmDih6q 93xuDPn9SZcKy7UijJwZqlMzdeLKBDNqflMyXn2RbKzXoErfdq+Z7ud8t pRoS3+L7WH9jNXqAUC2LTezh+joU1MXsmQxLi6+GCuiwfwVbmcT4yes6T zb8dHEHTCsGSl/ThrD3FQXuhv+RYG7z+gKfYE2AsJoj19g12ZvDax/gHN lLfYKDgWBA6pYaIgGAn+rAD+cv79z3p0ip6agFLdBCQorprUI7pFcau3k k3tBofCh3557UhYOzkUQUz3NDAYxbcu2M6gLbs2umEotX2dImd1octJfv A==; X-CSE-ConnectionGUID: wODc1ON9TVCTOF3fHbs+lg== X-CSE-MsgGUID: LUCFnbToQb6dY0cP9D6wBw== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613963" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613963" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:05 -0700 X-CSE-ConnectionGUID: bbyX+Y9bRnCQGjviptusEw== X-CSE-MsgGUID: GMk516bPQ/inx3OLKItl4A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942765" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:04 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 08/16] KVM: x86/mmu: Bug the VM if kvm_zap_gfn_range() is called for TDX Date: Tue, 14 May 2024 17:59:44 -0700 Message-Id: <20240515005952.3410568-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When virtualizing some CPU features, KVM uses kvm_zap_gfn_range() to zap guest mappings so they can be faulted in with different PTE properties. For TDX private memory this technique is fundamentally not possible. Remapping private memory requires the guest to "accept" it, and also the needed PTE properties are not currently supported by TDX for private memory. These CPU features are: 1) MTRR update 2) CR0.CD update 3) Non-coherent DMA status update 4) APICV update Since they cannot be supported, they should be blocked from being exercised by a TD. In the case of CRO.CD, the feature is fundamentally not supported for TDX as it cannot see the guest registers. For APICV inhibit it in future changes. Guest MTRR support is more of an interesting case. Supported versions of the TDX module fix the MTRR CPUID bit to 1, but as previously discussed, it is not possible to fully support the feature. This leaves KVM with a few options: - Support a modified version of the architecture where the caching attributes are ignored for private memory. - Don't support MTRRs and treat the set MTRR CPUID bit as a TDX Module bug. With the additional consideration that likely guest MTRR support in KVM will be going away, the later option is the best. Prevent MTRR MSR writes from calling kvm_zap_gfn_range() in future changes. Lastly, the most interesting case is non-coherent DMA status updates. There isn't a way to reject the call. KVM is just notified that there is a non-coherent DMA device attached, and expected to act accordingly. For normal VMs today, that means to start respecting guest PAT. However, recently there has been a proposal to avoid doing this on selfsnoop CPUs (see link). On such CPUs it should not be problematic to simply always configure the EPT to honor guest PAT. In future changes TDX can enforce this behavior for shared memory, resulting in shared memory always respecting guest PAT for TDX. So kvm_zap_gfn_range() will not need to be called in this case either. Unfortunately, this will result in different cache attributes between private and shared memory, as private memory is always WB and cannot be changed by the VMM on current TDX modules. But it can't really be helped while also supporting non-coherent DMA devices. Since all callers will be prevented from calling kvm_zap_gfn_range() in future changes, report a bug and terminate the guest if other future changes to KVM result in triggering kvm_zap_gfn_range() for a TD. For lack of a better method currently, use kvm_gfn_shared_mask() to determine if private memory cannot be zapped (as in TDX, the only VM type that sets it). Link: https://lore.kernel.org/all/20240309010929.1403984-6-seanjc@google.com/ Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - Remove support from "KVM: x86/tdp_mmu: Zap leafs only for private memory" - Add this KVM_BUG_ON() instead --- arch/x86/kvm/mmu/mmu.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d5cf5b15a10e..808805b3478d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6528,8 +6528,17 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); - if (tdp_mmu_enabled) + if (tdp_mmu_enabled) { + /* + * kvm_zap_gfn_range() is used when MTRR or PAT memory + * type was changed. TDX can't handle zapping the private + * mapping, but it's ok because KVM doesn't support either of + * those features for TDX. In case a new caller appears, BUG + * the VM if it's called for solutions with private aliases. + */ + KVM_BUG_ON(kvm_gfn_shared_mask(kvm), kvm); flush = kvm_tdp_mmu_zap_leafs(kvm, gfn_start, gfn_end, flush); + } if (flush) kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start); From patchwork Wed May 15 00:59:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664500 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 170C9208CA; Wed, 15 May 2024 01:00:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734810; cv=none; b=b1vCsnPZcwNO0ScMairq4uhrL+ABdJylPRcDllP8IEY+ei5z8pZwCazFF3FRF5r7znVHDtWzx3tTH+v+YHNakGO0ekpLWmPN0Yzm521xZJryzv/96XWv7G9zDwGCr0extwbs8Zn2m/iWTbTKqm0fzhAIdyM4l3qy6ZljGIKtpaI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734810; c=relaxed/simple; bh=F8pMCkdedaDEThDQRu19WnnjslknaUfMOKX2A+N1yxQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nD8S8lczp9Rfquuhtim4MfeNOXiy+h4ugXeZTdItiS4nLRWja3UtHiMgUoOX3zwhQ35N0nfPBiiQTKDRb6BuKJXslnnFX5fvIF6ppqpb82KK0tSvLGT289APcVL+U+wQMYKeVQpZtBOBLnL4ugxbekYlSmwx6L2qq/21uw7vW7E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=iq4SykKi; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iq4SykKi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734809; x=1747270809; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=F8pMCkdedaDEThDQRu19WnnjslknaUfMOKX2A+N1yxQ=; b=iq4SykKiBCkQRz22jrQ8/Fg8/A2SipiWvWkvsku1XngWBLP9Z2ylxqos SwxC9ZBECrxA5toFzxEfa7L0luihlFJVGr1oEJTdUEiAtHRm2ERiQvHR9 HYzUle236moJVhbfGdQzkWIXjfUle8A7gkPx7luh4geyDLMm32Ebuz4Dj Lw7NKIPEkMeP4pr5HKL1zD2gbSj4R5tvJjBq5FVB1BtZM/fdTi+5XWKpc Er48fqj46pZNW4zptZj3OuHQdz9cqFyYY1RpqaclQyxPctd8yQaXv3wqD djomzv1WJeNbQe+IwzBhCUCjQfo5pDmJ/ehMQgGMuIk9IlT96ya65iGLm A==; X-CSE-ConnectionGUID: aucpdI9cT6emLVw5xIz7CA== X-CSE-MsgGUID: 5K97AqX5Sxq2COkilqo6rA== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613967" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613967" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:06 -0700 X-CSE-ConnectionGUID: AOJ4G5LsTwGNnoW5TOQ9hw== X-CSE-MsgGUID: puttENqYQQS+WPln4B9VHA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942777" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:05 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 09/16] KVM: x86/mmu: Make kvm_tdp_mmu_alloc_root() return void Date: Tue, 14 May 2024 17:59:45 -0700 Message-Id: <20240515005952.3410568-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The kvm_tdp_mmu_alloc_root() function currently always returns 0. This allows for the caller, mmu_alloc_direct_roots(), to call kvm_tdp_mmu_alloc_root() and also return 0 in one line: return kvm_tdp_mmu_alloc_root(vcpu); So it is useful even though the return value of kvm_tdp_mmu_alloc_root() is always the same. However, in future changes, kvm_tdp_mmu_alloc_root() will be called twice in mmu_alloc_direct_roots(). This will force the first call to either awkwardly handle the return value that will always be zero or ignore it. So change kvm_tdp_mmu_alloc_root() to return void. Do it in a separate change so the future change will be cleaner. Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - New patch --- arch/x86/kvm/mmu/mmu.c | 6 ++++-- arch/x86/kvm/mmu/tdp_mmu.c | 3 +-- arch/x86/kvm/mmu/tdp_mmu.h | 2 +- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 808805b3478d..76f92cb37a96 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3700,8 +3700,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) unsigned i; int r; - if (tdp_mmu_enabled) - return kvm_tdp_mmu_alloc_root(vcpu); + if (tdp_mmu_enabled) { + kvm_tdp_mmu_alloc_root(vcpu); + return 0; + } write_lock(&vcpu->kvm->mmu_lock); r = make_mmu_pages_available(vcpu); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6fa910b017d1..0d6d96d86703 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -224,7 +224,7 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp, tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role); } -int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu = vcpu->arch.mmu; union kvm_mmu_page_role role = mmu->root_role; @@ -285,7 +285,6 @@ int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) */ mmu->root.hpa = __pa(root->spt); mmu->root.pgd = 0; - return 0; } static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 58b55e61bd33..437ddd4937a9 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -10,7 +10,7 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm); void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); -int kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu); +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu); __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) { From patchwork Wed May 15 00:59:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664503 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B28712B9C4; Wed, 15 May 2024 01:00:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734813; cv=none; b=SEVEQBg08uT0QgIxWfwTADqIWRrMynMQNZNlHL7d1KWya1sflfu3oQWR5k0D3idwaAa4IfO322WuYADA94iwV0i+9dbPCClVNlOaMHn9ertJt6Je1P+cVNkRKec4Kyq/PKpYwH3vKDOp6czO8tbMfmHissafAOSpBnasg5li8cU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734813; c=relaxed/simple; bh=Ts0Q7br9qwoNR0tYNUlPsMXwR1VYg0z28EqH1RigXkg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=s/ReA4XD5LKmE6ThF6OsgnBRDG3siD//d3PlS4EMzcini7aZE1zDjid/1gIcEYGQZKYwfYizCCLfQhRWA5NdyLP4r6lWnptcR5IpuUEhM0dP3xfVF5YRY21jxCHjBtedaUnAtyjND5cNCjL/kAx4D06AjgcQoEl6Ms7KWVu3nDs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OR6iHeN3; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OR6iHeN3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734811; x=1747270811; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ts0Q7br9qwoNR0tYNUlPsMXwR1VYg0z28EqH1RigXkg=; b=OR6iHeN3gP1WEX/VcXP9ABpboyDmJbOZejxPDEALNseXeuDwwVhCILHO 16rChk3c+HXt6QH5dhLGwN8L+CQ0uC/X39UWPp698YhaXAW0dyulfxgQX M20YOCgDG7kIRk83lRm4BFf3XtGiP437J15W/DmX74g1XmfRtFclaoAyU mv75byGK202tS4uS6T6vamsfu6jdBN+4W42as7bKWzwIWezZMIKCkx8GU 2jdq9gC6baSPFClKOaHqVA5xTiEDS/q01Av56oUPUtEUWxwRYywpX6e4l hrw3/5HPZn0Fs+PSBTrqcMsxsbzRzlP+mkYFquFJbiLsVXt56SmGJ/qUv g==; X-CSE-ConnectionGUID: E8uWYvgARoCFGu++sO2ANg== X-CSE-MsgGUID: VBgeHVCbTPCEhJgJLg86Jg== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613973" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613973" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:06 -0700 X-CSE-ConnectionGUID: Ub/KEAqETpSJu78Qo3L98w== X-CSE-MsgGUID: 2Fpj+nkiQiqzX5Isi6cI8g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942788" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:05 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU Date: Tue, 14 May 2024 17:59:46 -0700 Message-Id: <20240515005952.3410568-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Allocate mirrored page table for the private page table and implement MMU hooks to operate on the private page table. To handle page fault to a private GPA, KVM walks the mirrored page table in unencrypted memory and then uses MMU hooks in kvm_x86_ops to propagate changes from the mirrored page table to private page table. private KVM page fault | | | V | private GPA | CPU protected EPTP | | | V | V mirrored PT root | private PT root | | | V | V mirrored PT --hook to propagate-->private PT | | | \--------------------+------\ | | | | | V V | private guest page | | non-encrypted memory | encrypted memory | PT: page table Private PT: the CPU uses it, but it is invisible to KVM. TDX module manages this table to map private guest pages. Mirrored PT:It is visible to KVM, but the CPU doesn't use it. KVM uses it to propagate PT change to the actual private PT. SPTEs in mirrored page table (refer to them as mirrored SPTEs hereafter) can be modified atomically with mmu_lock held for read, however, the MMU hooks to private page table are not atomical operations. To address it, a special REMOVED_SPTE is introduced and below sequence is used when mirrored SPTEs are updated atomically. 1. Mirrored SPTE is first atomically written to REMOVED_SPTE. 2. The successful updater of the mirrored SPTE in step 1 proceeds with the following steps. 3. Invoke MMU hooks to modify private page table with the target value. 4. (a) On hook succeeds, update mirrored SPTE to target value. (b) On hook failure, restore mirrored SPTE to original value. KVM TDP MMU ensures other threads will not overrite REMOVED_SPTE. This sequence also applies when SPTEs are atomiclly updated from non-present to present in order to prevent potential conflicts when multiple vCPUs attempt to set private SPTEs to a different page size simultaneously, though 4K page size is only supported for private page table currently. 2M page support can be done in future patches. Signed-off-by: Isaku Yamahata Co-developed-by: Kai Huang Signed-off-by: Kai Huang Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - Remove unnecessary gfn, access twist in tdp_mmu_map_handle_target_level(). (Chao Gao) - Open code call to kvm_mmu_alloc_private_spt() instead oCf doing it in tdp_mmu_alloc_sp() - Update comment in set_private_spte_present() (Yan) - Open code call to kvm_mmu_init_private_spt() (Yan) - Add comments on TDX MMU hooks (Yan) - Fix various whitespace alignment (Yan) - Remove pointless warnings and conditionals in handle_removed_private_spte() (Yan) - Remove redundant lockdep assert in tdp_mmu_set_spte() (Yan) - Remove incorrect comment in handle_changed_spte() (Yan) - Remove unneeded kvm_pfn_to_refcounted_page() and is_error_noslot_pfn() check in kvm_tdp_mmu_map() (Yan) - Do kvm_gfn_for_root() branchless (Rick) - Update kvm_tdp_mmu_alloc_root() callers to not check error code (Rick) - Add comment for stripping shared bit for fault.gfn (Chao) v19: - drop CONFIG_KVM_MMU_PRIVATE v18: - Rename freezed => frozen v14 -> v15: - Refined is_private condition check in kvm_tdp_mmu_map(). Add kvm_gfn_shared_mask() check. - catch up for struct kvm_range change --- arch/x86/include/asm/kvm-x86-ops.h | 5 + arch/x86/include/asm/kvm_host.h | 25 +++ arch/x86/kvm/mmu/mmu.c | 13 +- arch/x86/kvm/mmu/mmu_internal.h | 19 +- arch/x86/kvm/mmu/tdp_iter.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 269 +++++++++++++++++++++++++---- arch/x86/kvm/mmu/tdp_mmu.h | 2 +- 7 files changed, 293 insertions(+), 42 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h index 566d19b02483..d13cb4b8fce6 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -95,6 +95,11 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) +KVM_X86_OP_OPTIONAL(link_private_spt) +KVM_X86_OP_OPTIONAL(free_private_spt) +KVM_X86_OP_OPTIONAL(set_private_spte) +KVM_X86_OP_OPTIONAL(remove_private_spte) +KVM_X86_OP_OPTIONAL(zap_private_spte) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d010ca5c7f44..20fa8fa58692 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -470,6 +470,7 @@ struct kvm_mmu { int (*sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i); struct kvm_mmu_root_info root; + hpa_t private_root_hpa; union kvm_cpu_role cpu_role; union kvm_mmu_page_role root_role; @@ -1747,6 +1748,30 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); + /* Add a page as page table page into private page table */ + int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, + void *private_spt); + /* + * Free a page table page of private page table. + * Only expected to be called when guest is not active, specifically + * during VM destruction phase. + */ + int (*free_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, + void *private_spt); + + /* Add a guest private page into private page table */ + int (*set_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, + kvm_pfn_t pfn); + + /* Remove a guest private page from private page table*/ + int (*remove_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, + kvm_pfn_t pfn); + /* + * Keep a guest private page mapped in private page table, but clear its + * present bit + */ + int (*zap_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level); + bool (*has_wbinvd_exit)(void); u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 76f92cb37a96..2506d6277818 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3701,7 +3701,9 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) int r; if (tdp_mmu_enabled) { - kvm_tdp_mmu_alloc_root(vcpu); + if (kvm_gfn_shared_mask(vcpu->kvm)) + kvm_tdp_mmu_alloc_root(vcpu, true); + kvm_tdp_mmu_alloc_root(vcpu, false); return 0; } @@ -4685,7 +4687,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) if (kvm_mmu_honors_guest_mtrrs(vcpu->kvm)) { for ( ; fault->max_level > PG_LEVEL_4K; --fault->max_level) { int page_num = KVM_PAGES_PER_HPAGE(fault->max_level); - gfn_t base = gfn_round_for_level(fault->gfn, + gfn_t base = gfn_round_for_level(gpa_to_gfn(fault->addr), fault->max_level); if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num)) @@ -6245,6 +6247,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) mmu->root.hpa = INVALID_PAGE; mmu->root.pgd = 0; + mmu->private_root_hpa = INVALID_PAGE; for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID; @@ -7263,6 +7266,12 @@ int kvm_mmu_vendor_module_init(void) void kvm_mmu_destroy(struct kvm_vcpu *vcpu) { kvm_mmu_unload(vcpu); + if (tdp_mmu_enabled) { + read_lock(&vcpu->kvm->mmu_lock); + mmu_free_root_page(vcpu->kvm, &vcpu->arch.mmu->private_root_hpa, + NULL); + read_unlock(&vcpu->kvm->mmu_lock); + } free_mmu_pages(&vcpu->arch.root_mmu); free_mmu_pages(&vcpu->arch.guest_mmu); mmu_free_memory_caches(vcpu); diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 0f1a9d733d9e..3a7fe9261e23 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -6,6 +6,8 @@ #include #include +#include "mmu.h" + #ifdef CONFIG_KVM_PROVE_MMU #define KVM_MMU_WARN_ON(x) WARN_ON_ONCE(x) #else @@ -178,6 +180,16 @@ static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_m sp->private_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_private_spt_cache); } +static inline gfn_t kvm_gfn_for_root(struct kvm *kvm, struct kvm_mmu_page *root, + gfn_t gfn) +{ + gfn_t gfn_for_root = kvm_gfn_to_private(kvm, gfn); + + /* Set shared bit if not private */ + gfn_for_root |= -(gfn_t)!is_private_sp(root) & kvm_gfn_shared_mask(kvm); + return gfn_for_root; +} + static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp) { /* @@ -348,7 +360,12 @@ static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gp int r; if (vcpu->arch.mmu->root_role.direct) { - fault.gfn = fault.addr >> PAGE_SHIFT; + /* + * Things like memslots don't understand the concept of a shared + * bit. Strip it so that the GFN can be used like normal, and the + * fault.addr can be used when the shared bit is needed. + */ + fault.gfn = gpa_to_gfn(fault.addr) & ~kvm_gfn_shared_mask(vcpu->kvm); fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn); } diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index fae559559a80..8a64bcef9deb 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -91,7 +91,7 @@ struct tdp_iter { tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL]; /* A pointer to the current SPTE */ tdp_ptep_t sptep; - /* The lowest GFN mapped by the current SPTE */ + /* The lowest GFN (shared bits included) mapped by the current SPTE */ gfn_t gfn; /* The level of the root page given to the iterator */ int root_level; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0d6d96d86703..810d552e9bf6 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -224,7 +224,7 @@ static void tdp_mmu_init_child_sp(struct kvm_mmu_page *child_sp, tdp_mmu_init_sp(child_sp, iter->sptep, iter->gfn, role); } -void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool private) { struct kvm_mmu *mmu = vcpu->arch.mmu; union kvm_mmu_page_role role = mmu->root_role; @@ -232,6 +232,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) struct kvm *kvm = vcpu->kvm; struct kvm_mmu_page *root; + if (private) + kvm_mmu_page_role_set_private(&role); + /* * Check for an existing root before acquiring the pages lock to avoid * unnecessary serialization if multiple vCPUs are loading a new root. @@ -283,13 +286,17 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu) * and actually consuming the root if it's invalidated after dropping * mmu_lock, and the root can't be freed as this vCPU holds a reference. */ - mmu->root.hpa = __pa(root->spt); - mmu->root.pgd = 0; + if (private) { + mmu->private_root_hpa = __pa(root->spt); + } else { + mmu->root.hpa = __pa(root->spt); + mmu->root.pgd = 0; + } } static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared); + u64 old_spte, u64 new_spte, + union kvm_mmu_page_role role, bool shared); static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { @@ -416,12 +423,124 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared) REMOVED_SPTE, level); } handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, - old_spte, REMOVED_SPTE, level, shared); + old_spte, REMOVED_SPTE, sp->role, + shared); + } + + if (is_private_sp(sp) && + WARN_ON(static_call(kvm_x86_free_private_spt)(kvm, sp->gfn, sp->role.level, + kvm_mmu_private_spt(sp)))) { + /* + * Failed to free page table page in private page table and + * there is nothing to do further. + * Intentionally leak the page to prevent the kernel from + * accessing the encrypted page. + */ + sp->private_spt = NULL; } call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } +static void *get_private_spt(gfn_t gfn, u64 new_spte, int level) +{ + if (is_shadow_present_pte(new_spte) && !is_last_spte(new_spte, level)) { + struct kvm_mmu_page *sp = to_shadow_page(pfn_to_hpa(spte_to_pfn(new_spte))); + void *private_spt = kvm_mmu_private_spt(sp); + + WARN_ON_ONCE(!private_spt); + WARN_ON_ONCE(sp->role.level + 1 != level); + WARN_ON_ONCE(sp->gfn != gfn); + return private_spt; + } + + return NULL; +} + +static void handle_removed_private_spte(struct kvm *kvm, gfn_t gfn, + u64 old_spte, u64 new_spte, + int level) +{ + bool was_present = is_shadow_present_pte(old_spte); + bool was_leaf = was_present && is_last_spte(old_spte, level); + kvm_pfn_t old_pfn = spte_to_pfn(old_spte); + int ret; + + /* + * Allow only leaf page to be zapped. Reclaim non-leaf page tables page + * at destroying VM. + */ + if (!was_leaf) + return; + + /* Zapping leaf spte is allowed only when write lock is held. */ + lockdep_assert_held_write(&kvm->mmu_lock); + ret = static_call(kvm_x86_zap_private_spte)(kvm, gfn, level); + /* Because write lock is held, operation should success. */ + if (KVM_BUG_ON(ret, kvm)) + return; + + ret = static_call(kvm_x86_remove_private_spte)(kvm, gfn, level, old_pfn); + KVM_BUG_ON(ret, kvm); +} + +static int __must_check __set_private_spte_present(struct kvm *kvm, tdp_ptep_t sptep, + gfn_t gfn, u64 old_spte, + u64 new_spte, int level) +{ + bool was_present = is_shadow_present_pte(old_spte); + bool is_present = is_shadow_present_pte(new_spte); + bool is_leaf = is_present && is_last_spte(new_spte, level); + kvm_pfn_t new_pfn = spte_to_pfn(new_spte); + int ret = 0; + + lockdep_assert_held(&kvm->mmu_lock); + /* TDP MMU doesn't change present -> present */ + KVM_BUG_ON(was_present, kvm); + + /* + * Use different call to either set up middle level + * private page table, or leaf. + */ + if (is_leaf) { + ret = static_call(kvm_x86_set_private_spte)(kvm, gfn, level, new_pfn); + } else { + void *private_spt = get_private_spt(gfn, new_spte, level); + + KVM_BUG_ON(!private_spt, kvm); + ret = static_call(kvm_x86_link_private_spt)(kvm, gfn, level, private_spt); + } + + return ret; +} + +static int __must_check set_private_spte_present(struct kvm *kvm, tdp_ptep_t sptep, + gfn_t gfn, u64 old_spte, + u64 new_spte, int level) +{ + int ret; + + /* + * For private page table, callbacks are needed to propagate SPTE + * change into the private page table. In order to atomically update + * both the SPTE and the private page tables with callbacks, utilize + * freezing SPTE. + * - Freeze the SPTE. Set entry to REMOVED_SPTE. + * - Trigger callbacks for private page tables. + * - Unfreeze the SPTE. Set the entry to new_spte. + */ + lockdep_assert_held(&kvm->mmu_lock); + if (!try_cmpxchg64(sptep, &old_spte, REMOVED_SPTE)) + return -EBUSY; + + ret = __set_private_spte_present(kvm, sptep, gfn, old_spte, new_spte, level); + if (ret) + __kvm_tdp_mmu_write_spte(sptep, old_spte); + else + __kvm_tdp_mmu_write_spte(sptep, new_spte); + return ret; +} + /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance @@ -429,7 +548,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared) * @gfn: the base GFN that was mapped by the SPTE * @old_spte: The value of the SPTE before the change * @new_spte: The value of the SPTE after the change - * @level: the level of the PT the SPTE is part of in the paging structure + * @role: the role of the PT the SPTE is part of in the paging structure * @shared: This operation may not be running under the exclusive use of * the MMU lock and the operation must synchronize with other * threads that might be modifying SPTEs. @@ -439,14 +558,18 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared) * and fast_pf_fix_direct_spte()). */ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared) + u64 old_spte, u64 new_spte, + union kvm_mmu_page_role role, bool shared) { + bool is_private = kvm_mmu_page_role_is_private(role); + int level = role.level; bool was_present = is_shadow_present_pte(old_spte); bool is_present = is_shadow_present_pte(new_spte); bool was_leaf = was_present && is_last_spte(old_spte, level); bool is_leaf = is_present && is_last_spte(new_spte, level); - bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte); + kvm_pfn_t old_pfn = spte_to_pfn(old_spte); + kvm_pfn_t new_pfn = spte_to_pfn(new_spte); + bool pfn_changed = old_pfn != new_pfn; WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL); WARN_ON_ONCE(level < PG_LEVEL_4K); @@ -513,7 +636,7 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, if (was_leaf && is_dirty_spte(old_spte) && (!is_present || !is_dirty_spte(new_spte) || pfn_changed)) - kvm_set_pfn_dirty(spte_to_pfn(old_spte)); + kvm_set_pfn_dirty(old_pfn); /* * Recursively handle child PTs if the change removed a subtree from @@ -522,15 +645,21 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, * pages are kernel allocations and should never be migrated. */ if (was_present && !was_leaf && - (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) + (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) { + KVM_BUG_ON(is_private != is_private_sptep(spte_to_child_pt(old_spte, level)), + kvm); handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); + } + + if (is_private && !is_present) + handle_removed_private_spte(kvm, gfn, old_spte, new_spte, role.level); if (was_leaf && is_accessed_spte(old_spte) && (!is_present || !is_accessed_spte(new_spte) || pfn_changed)) kvm_set_pfn_accessed(spte_to_pfn(old_spte)); } -static inline int __tdp_mmu_set_spte_atomic(struct tdp_iter *iter, u64 new_spte) +static inline int __tdp_mmu_set_spte_atomic(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { u64 *sptep = rcu_dereference(iter->sptep); @@ -542,15 +671,42 @@ static inline int __tdp_mmu_set_spte_atomic(struct tdp_iter *iter, u64 new_spte) */ WARN_ON_ONCE(iter->yielded || is_removed_spte(iter->old_spte)); - /* - * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and - * does not hold the mmu_lock. On failure, i.e. if a different logical - * CPU modified the SPTE, try_cmpxchg64() updates iter->old_spte with - * the current value, so the caller operates on fresh data, e.g. if it - * retries tdp_mmu_set_spte_atomic() - */ - if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte)) - return -EBUSY; + if (is_private_sptep(iter->sptep) && !is_removed_spte(new_spte)) { + int ret; + + if (is_shadow_present_pte(new_spte)) { + /* + * Populating case. + * - set_private_spte_present() implements + * 1) Freeze SPTE + * 2) call hooks to update private page table, + * 3) update SPTE to new_spte + * - handle_changed_spte() only updates stats. + */ + ret = set_private_spte_present(kvm, iter->sptep, iter->gfn, + iter->old_spte, new_spte, iter->level); + if (ret) + return ret; + } else { + /* + * Zapping case. + * Zap is only allowed when write lock is held + */ + if (WARN_ON_ONCE(!is_shadow_present_pte(new_spte))) + return -EBUSY; + } + } else { + /* + * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs + * and does not hold the mmu_lock. On failure, i.e. if a + * different logical CPU modified the SPTE, try_cmpxchg64() + * updates iter->old_spte with the current value, so the caller + * operates on fresh data, e.g. if it retries + * tdp_mmu_set_spte_atomic() + */ + if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte)) + return -EBUSY; + } return 0; } @@ -576,23 +732,24 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { + u64 *sptep = rcu_dereference(iter->sptep); int ret; lockdep_assert_held_read(&kvm->mmu_lock); - ret = __tdp_mmu_set_spte_atomic(iter, new_spte); + ret = __tdp_mmu_set_spte_atomic(kvm, iter, new_spte); if (ret) return ret; handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte, - new_spte, iter->level, true); - + new_spte, sptep_to_sp(sptep)->role, true); return 0; } static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, struct tdp_iter *iter) { + union kvm_mmu_page_role role; int ret; lockdep_assert_held_read(&kvm->mmu_lock); @@ -605,7 +762,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, * Delay processing of the zapped SPTE until after TLBs are flushed and * the REMOVED_SPTE is replaced (see below). */ - ret = __tdp_mmu_set_spte_atomic(iter, REMOVED_SPTE); + ret = __tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_SPTE); if (ret) return ret; @@ -619,6 +776,8 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, */ __kvm_tdp_mmu_write_spte(iter->sptep, SHADOW_NONPRESENT_VALUE); + + role = sptep_to_sp(iter->sptep)->role; /* * Process the zapped SPTE after flushing TLBs, and after replacing * REMOVED_SPTE with 0. This minimizes the amount of time vCPUs are @@ -626,7 +785,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, * SPTEs. */ handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte, - 0, iter->level, true); + SHADOW_NONPRESENT_VALUE, role, true); return 0; } @@ -648,6 +807,8 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm, static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, u64 old_spte, u64 new_spte, gfn_t gfn, int level) { + union kvm_mmu_page_role role; + lockdep_assert_held_write(&kvm->mmu_lock); /* @@ -660,8 +821,16 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, WARN_ON_ONCE(is_removed_spte(old_spte) || is_removed_spte(new_spte)); old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level); + if (is_private_sptep(sptep) && !is_removed_spte(new_spte) && + is_shadow_present_pte(new_spte)) { + /* Because write spin lock is held, no race. It should success. */ + KVM_BUG_ON(__set_private_spte_present(kvm, sptep, gfn, old_spte, + new_spte, level), kvm); + } - handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false); + role = sptep_to_sp(sptep)->role; + role.level = level; + handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, role, false); return old_spte; } @@ -684,8 +853,11 @@ static inline void tdp_mmu_iter_set_spte(struct kvm *kvm, struct tdp_iter *iter, continue; \ else -#define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end) \ - for_each_tdp_pte(_iter, root_to_sp(_mmu->root.hpa), _start, _end) +#define tdp_mmu_for_each_pte(_iter, _mmu, _private, _start, _end) \ + for_each_tdp_pte(_iter, \ + root_to_sp((_private) ? _mmu->private_root_hpa : \ + _mmu->root.hpa), \ + _start, _end) /* * Yield if the MMU lock is contended or this thread needs to return control @@ -853,6 +1025,14 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, lockdep_assert_held_write(&kvm->mmu_lock); + /* + * start and end doesn't have GFN shared bit. This function zaps + * a region including alias. Adjust shared bit of [start, end) if the + * root is shared. + */ + start = kvm_gfn_for_root(kvm, root, start); + end = kvm_gfn_for_root(kvm, root, end); + rcu_read_lock(); for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) { @@ -1029,8 +1209,8 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); else wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn, - fault->pfn, iter->old_spte, fault->prefetch, true, - fault->map_writable, &new_spte); + fault->pfn, iter->old_spte, fault->prefetch, true, + fault->map_writable, &new_spte); if (new_spte == iter->old_spte) ret = RET_PF_SPURIOUS; @@ -1108,6 +1288,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) struct kvm *kvm = vcpu->kvm; struct tdp_iter iter; struct kvm_mmu_page *sp; + gfn_t raw_gfn; + bool is_private = fault->is_private && kvm_gfn_shared_mask(kvm); int ret = RET_PF_RETRY; kvm_mmu_hugepage_adjust(vcpu, fault); @@ -1116,7 +1298,9 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) rcu_read_lock(); - tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) { + raw_gfn = gpa_to_gfn(fault->addr); + + tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn + 1) { int r; if (fault->nx_huge_page_workaround_enabled) @@ -1142,14 +1326,22 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) * needs to be split. */ sp = tdp_mmu_alloc_sp(vcpu); + if (kvm_is_private_gpa(kvm, raw_gfn << PAGE_SHIFT)) + kvm_mmu_alloc_private_spt(vcpu, sp); tdp_mmu_init_child_sp(sp, &iter); sp->nx_huge_page_disallowed = fault->huge_page_disallowed; - if (is_shadow_present_pte(iter.old_spte)) + if (is_shadow_present_pte(iter.old_spte)) { + /* + * TODO: large page support. + * Doesn't support large page for TDX now + */ + KVM_BUG_ON(is_private_sptep(iter.sptep), vcpu->kvm); r = tdp_mmu_split_huge_page(kvm, &iter, sp, true); - else + } else { r = tdp_mmu_link_sp(kvm, &iter, sp, true); + } /* * Force the guest to retry if installing an upper level SPTE @@ -1780,7 +1972,7 @@ static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; - tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { + tdp_mmu_for_each_pte(iter, mmu, is_private, gfn, gfn + 1) { leaf = iter.level; sptes[leaf] = iter.old_spte; } @@ -1838,7 +2030,10 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr, gfn_t gfn = addr >> PAGE_SHIFT; tdp_ptep_t sptep = NULL; - tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { + /* fast page fault for private GPA isn't supported. */ + WARN_ON_ONCE(kvm_is_private_gpa(vcpu->kvm, addr)); + + tdp_mmu_for_each_pte(iter, mmu, false, gfn, gfn + 1) { *spte = iter.old_spte; sptep = iter.sptep; } diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 437ddd4937a9..ac350c51bc18 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -10,7 +10,7 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm); void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); -void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu); +void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool private); __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) { From patchwork Wed May 15 00:59:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664502 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D95602BB0A; Wed, 15 May 2024 01:00:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734812; cv=none; b=QIfMh2syZJKHVWZRXm80npvQ0yqvoSrunVtDs5jAEDKef6TOvJeSFmqAbfG8kjKGW6Q+U7jjl7XJjgFUXuUViIB1GMU+Lw2xUsPiraVj4ah52Rlz86uad3IZh+YjLtJKCEDPvO38gBJU7VFitCrsuwzXF/y2adGWAWQ9cgPJiCI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734812; c=relaxed/simple; bh=cTrLZlU5/ZsvuUDUsSy+jSAeCzN3jetOaTy8YedbsKY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hFehqK7e4pVrkthz2yPiLXHCn3LCBUhrViq5mujy+1YzJYNT66MNTQLTakJ+FkHeowNTZI8P88a2imrywgY2QHxBUhFwI3ESzGcFuTxbH9MWwctEvxCTHWvFGV0ZMGtef714+YOZXYgdHqtSaxYAIFenYLCo7IzmuL3M6eL7g44= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AFoK7OyV; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AFoK7OyV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734811; x=1747270811; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cTrLZlU5/ZsvuUDUsSy+jSAeCzN3jetOaTy8YedbsKY=; b=AFoK7OyVwszSq/NaygsTlXgz4FubPqVXCMSBsiTP24quWepFWmyfN0r8 2jjZL+SNxWp1mg23f5Wc31P9QjxsQmymxyf4b/Tgqy0iWWaOuJNuE0ffj l7XBPlxyGemynYxfFtUQ3POh66/EG6crTXBvB5rbgb9ZKkHJgI/BfQkiK AoyOA05tp/UGnGcAjgfg8IXzHRGf+9QQqnw4dTsmw1+e35PWKvPRI7zta 7A4PHElULibQSvEhtGJjJf8ZW7xiLBPyj9UKw5e0hKZMMepKr3QqBPdNl KwEMpTQg7yCYt98+UFC69loMlIZAf8h9aNfPj8rPKDW+JqXpaTUpw5ihV g==; X-CSE-ConnectionGUID: ji/7WyQsQ1ieWBEVEEc7FA== X-CSE-MsgGUID: DS9xSeS9TPqPR9B02H7Ltg== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613977" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613977" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:07 -0700 X-CSE-ConnectionGUID: tzIbvwIoRTeatS/4eNNkEg== X-CSE-MsgGUID: loBjX6+QQAOazgFTj34OcA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942799" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:06 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 11/16] KVM: x86/tdp_mmu: Extract root invalid check from tdx_mmu_next_root() Date: Tue, 14 May 2024 17:59:47 -0700 Message-Id: <20240515005952.3410568-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Extract tdp_mmu_root_match() to check if the root has given types and use it for the root page table iterator. It checks only_invalid now. TDX KVM operates on a shared page table only (Shared-EPT), a mirrored page table only (Secure-EPT), or both based on the operation. KVM MMU notifier operations only on shared page table. KVM guest_memfd invalidation operations only on mirrored page table, and so on. Introduce a centralized matching function instead of open coding matching logic in the iterator. The next step is to extend the function to check whether the page is shared or private Link: https://lore.kernel.org/kvm/ZivazWQw1oCU8VBC@google.com/ Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - New patch --- arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 810d552e9bf6..a0b7c43e843d 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -92,6 +92,14 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback); } +static bool tdp_mmu_root_match(struct kvm_mmu_page *root, bool only_valid) +{ + if (only_valid && root->role.invalid) + return false; + + return true; +} + /* * Returns the next root after @prev_root (or the first root if @prev_root is * NULL). A reference to the returned root is acquired, and the reference to @@ -125,7 +133,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, typeof(*next_root), link); while (next_root) { - if ((!only_valid || !next_root->role.invalid) && + if (tdp_mmu_root_match(next_root, only_valid) && kvm_tdp_mmu_get_root(next_root)) break; @@ -176,7 +184,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ ((_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) || \ - ((_only_valid) && (_root)->role.invalid))) { \ + !tdp_mmu_root_match((_root), (_only_valid)))) { \ } else #define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ From patchwork Wed May 15 00:59:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664504 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20C7D364AE; Wed, 15 May 2024 01:00:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734813; cv=none; b=WcZ2uI6wphBIhEUjjwFQQzY+812yYlP8aMLjZKcMgj/b4Y0ecjTaC2+2UE5sLqDEHB3z/MG5r0nIylx6ayhGMv4LsnsgfnJS8JmdCG3lf28bWFVz7Vod8EeqfOcmWFOODF3Ejmc+/whwLDS+ojh2dxOnXUIgjPFDBudnkAsfegg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734813; c=relaxed/simple; bh=hwT7Hnp4s9Rtxpw8GmgoIARXgPDTRvl14/AMhCiqMB0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LhHuIF/uawS0Q6EoO13WxzQEICcdE/KATpa7I+8ZE6qQWqB0plDMqi5ujg5GBbhBMNsegtSxMsZFwvzL7yJUvb+DwImhFZoZOtQh0Bm37TxnV5Wsi5cGjxT+GbIWHy3V22P+bgoXmdWGniqV8q+78u0Kky5UX+AT17FMA/QhAxI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ilYMBWQp; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ilYMBWQp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734812; x=1747270812; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hwT7Hnp4s9Rtxpw8GmgoIARXgPDTRvl14/AMhCiqMB0=; b=ilYMBWQpZp+ecPXQJNPTj3T1t9ErvQBDGfxncmpVPIWogmyfAavvmMqc ZwvTqnO2fK1lVy1jbIhul2qXG7KOwdFefCVEKzct+IHT2VBctAM2pLsm3 kmMuTV4CDxs51SlgHwtgSHFGDxsyAOXvoKru4DJX01x/lBUsIkfobz5RK pkucL9SPR5QOUj30hbC6ha+W9cavBSuFB7LCHr31ppmgxdQEb508G2sd7 u8zdsSZeemi28r3GHIwaClQVejCwUQmHaEzYG28eBxyD/TlfgsTGekd3g RHsBH0BD1jhqZS2rkX81PAa+h2p9EkcXqZRebX47v3cMFD8g/haBPxmMx g==; X-CSE-ConnectionGUID: 9xW0yRnRSxu/XRoz1glDsw== X-CSE-MsgGUID: XCMbw4nFS6yAZpDMY9CMkw== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613981" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613981" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:07 -0700 X-CSE-ConnectionGUID: TUPxIWNtQLi04SmwZHLs4g== X-CSE-MsgGUID: aLwVOWGzReuTVOfZnQUmNQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942814" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:06 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 12/16] KVM: x86/tdp_mmu: Introduce KVM MMU root types to specify page table type Date: Tue, 14 May 2024 17:59:48 -0700 Message-Id: <20240515005952.3410568-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Define an enum kvm_tdp_mmu_root_types to specify the KVM MMU root type [1] so that the iterator on the root page table can consistently filter the root page table type instead of only_valid. TDX KVM will operate on KVM page tables with specified types. Shared page table, private page table, or both. Introduce an enum instead of bool only_valid so that we can easily enhance page table types applicable to shared, private, or both in addition to valid or not. Replace only_valid=false with KVM_ANY_ROOTS and only_valid=true with KVM_ANY_VALID_ROOTS. Use KVM_ANY_ROOTS and KVM_ANY_VALID_ROOTS to wrap KVM_VALID_ROOTS to avoid further code churn when shared and private are introduced. Link: https://lore.kernel.org/kvm/ZivazWQw1oCU8VBC@google.com/ [1] Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - Newly introduced. --- arch/x86/kvm/mmu/tdp_mmu.c | 39 +++++++++++++++++++------------------- arch/x86/kvm/mmu/tdp_mmu.h | 7 +++++++ 2 files changed, 27 insertions(+), 19 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a0b7c43e843d..7af395073e92 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -92,9 +92,10 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback); } -static bool tdp_mmu_root_match(struct kvm_mmu_page *root, bool only_valid) +static bool tdp_mmu_root_match(struct kvm_mmu_page *root, + enum kvm_tdp_mmu_root_types types) { - if (only_valid && root->role.invalid) + if ((types & KVM_VALID_ROOTS) && root->role.invalid) return false; return true; @@ -102,17 +103,17 @@ static bool tdp_mmu_root_match(struct kvm_mmu_page *root, bool only_valid) /* * Returns the next root after @prev_root (or the first root if @prev_root is - * NULL). A reference to the returned root is acquired, and the reference to - * @prev_root is released (the caller obviously must hold a reference to - * @prev_root if it's non-NULL). + * NULL) that matches with @types. A reference to the returned root is + * acquired, and the reference to @prev_root is released (the caller obviously + * must hold a reference to @prev_root if it's non-NULL). * - * If @only_valid is true, invalid roots are skipped. + * Roots that doesn't match with @types are skipped. * * Returns NULL if the end of tdp_mmu_roots was reached. */ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, struct kvm_mmu_page *prev_root, - bool only_valid) + enum kvm_tdp_mmu_root_types types) { struct kvm_mmu_page *next_root; @@ -133,7 +134,7 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, typeof(*next_root), link); while (next_root) { - if (tdp_mmu_root_match(next_root, only_valid) && + if (tdp_mmu_root_match(next_root, types) && kvm_tdp_mmu_get_root(next_root)) break; @@ -158,20 +159,20 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, * If shared is set, this function is operating under the MMU lock in read * mode. */ -#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _only_valid) \ - for (_root = tdp_mmu_next_root(_kvm, NULL, _only_valid); \ +#define __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, _types) \ + for (_root = tdp_mmu_next_root(_kvm, NULL, _types); \ ({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \ - _root = tdp_mmu_next_root(_kvm, _root, _only_valid)) \ + _root = tdp_mmu_next_root(_kvm, _root, _types)) \ if (_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) { \ } else #define for_each_valid_tdp_mmu_root_yield_safe(_kvm, _root, _as_id) \ - __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, true) + __for_each_tdp_mmu_root_yield_safe(_kvm, _root, _as_id, KVM_ANY_VALID_ROOTS) #define for_each_tdp_mmu_root_yield_safe(_kvm, _root) \ - for (_root = tdp_mmu_next_root(_kvm, NULL, false); \ + for (_root = tdp_mmu_next_root(_kvm, NULL, KVM_ANY_ROOTS); \ ({ lockdep_assert_held(&(_kvm)->mmu_lock); }), _root; \ - _root = tdp_mmu_next_root(_kvm, _root, false)) + _root = tdp_mmu_next_root(_kvm, _root, KVM_ANY_ROOTS)) /* * Iterate over all TDP MMU roots. Requires that mmu_lock be held for write, @@ -180,18 +181,18 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct kvm *kvm, * Holding mmu_lock for write obviates the need for RCU protection as the list * is guaranteed to be stable. */ -#define __for_each_tdp_mmu_root(_kvm, _root, _as_id, _only_valid) \ +#define __for_each_tdp_mmu_root(_kvm, _root, _as_id, _types) \ list_for_each_entry(_root, &_kvm->arch.tdp_mmu_roots, link) \ if (kvm_lockdep_assert_mmu_lock_held(_kvm, false) && \ ((_as_id >= 0 && kvm_mmu_page_as_id(_root) != _as_id) || \ - !tdp_mmu_root_match((_root), (_only_valid)))) { \ + !tdp_mmu_root_match((_root), (_types)))) { \ } else #define for_each_tdp_mmu_root(_kvm, _root, _as_id) \ - __for_each_tdp_mmu_root(_kvm, _root, _as_id, false) + __for_each_tdp_mmu_root(_kvm, _root, _as_id, KVM_ANY_ROOTS) #define for_each_valid_tdp_mmu_root(_kvm, _root, _as_id) \ - __for_each_tdp_mmu_root(_kvm, _root, _as_id, true) + __for_each_tdp_mmu_root(_kvm, _root, _as_id, KVM_ANY_VALID_ROOTS) static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) { @@ -1389,7 +1390,7 @@ bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, { struct kvm_mmu_page *root; - __for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id, false) + __for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id, KVM_ANY_ROOTS) flush = tdp_mmu_zap_leafs(kvm, root, range->start, range->end, range->may_block, flush); diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index ac350c51bc18..30f2ab88a642 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -19,6 +19,13 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root); +enum kvm_tdp_mmu_root_types { + KVM_VALID_ROOTS = BIT(0), + + KVM_ANY_ROOTS = 0, + KVM_ANY_VALID_ROOTS = KVM_VALID_ROOTS, +}; + bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush); bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm); From patchwork Wed May 15 00:59:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664505 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6AAE374F5; Wed, 15 May 2024 01:00:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734814; cv=none; b=qiMJi0MRjZxVm+wj8nEcmYYowne7GNTphCNMb8S4xZdQpfi/ZkEnnec8Dvvu1pyuiquVLcrgRM8CFtTzUUv9wQ+lS48VeXbr7iE1AT2rGh5L98JSoZ/wJM+nMeyg75mJF00E/Mp85MR3IoD0tgx8ulJjiT7X/V8TAFvZLOT4ncQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734814; c=relaxed/simple; bh=Qkp1GU5j+YcdfSEFw2npWErCnczzFIzBQgNqWdkwsPI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Iqov8m4AvY0KXAxPgf3H0CHl1mHCb5kBD+JFCB8iCv6rmc4MoAHnJbtfdADR29wUXcIFtrULY5D32ZSnyWyh+ZqXSJb8lasuJXN/t4Q7k0bQSME7+tISKn/1RNEv+OGQWEBvXnMWcoTF95KoXR2ubv04Fa7jKVLIr7hFfqetehk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CYE3EWNf; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CYE3EWNf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734813; x=1747270813; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Qkp1GU5j+YcdfSEFw2npWErCnczzFIzBQgNqWdkwsPI=; b=CYE3EWNfzC/Aj6mXMu0M2pN+C4Sz7TzhUgQlEfTHTXm97i3K/rlktZFO xr3c5xxcPQgauF0f9p5BsXmS5Rc0Jbu2HsAuh5jlYaLz3jGLByrvONTvZ SWvL5v7CJEGNNky7lcIviWHMX8BWpRgrIiFpWq5WWwPnC1Qf6NYaVhkpZ b7WbD985SSaEI0Ig3BZhhr+bcnplnkcquifxM44/m+cZQDKTJT5loYq5c 8MuScPqQ/3fAVVO4VfPSverBO7I5aeTda9hRMihYuJWDBwXfK1le3UBAF obddVj8dXvm2lp04AU+2JA2GweB8sF9Z4ZpEKO6xsA0b0LW286dUfMB0+ g==; X-CSE-ConnectionGUID: ZkkOwBu+SdGhZLK+gS796A== X-CSE-MsgGUID: sRYoHgc3TxSJTh0QCVjNgA== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613986" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613986" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:08 -0700 X-CSE-ConnectionGUID: i110ikQjRa6RJzaukZ0gxQ== X-CSE-MsgGUID: SK7QFYhpSkuSIdoCM5jSfw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942825" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:07 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 13/16] KVM: x86/tdp_mmu: Introduce shared, private KVM MMU root types Date: Tue, 14 May 2024 17:59:49 -0700 Message-Id: <20240515005952.3410568-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Add more types, shared and private to enum kvm_tdp_mmu_root_types to specify KVM MMU roots [1] so that the iterator on the root page table can consistently filter the root page table type. TDX KVM will operate on KVM page tables with specified types. Shared page table, private page table, or both. Introduce an enum to specify those page table types and make the iterator take it with the specified root type. Valid or not, and shared, private, or both. Enhance tdp_mmu_root_match() to understand private vs shared. Suggested-by: Sean Christopherson Link: https://lore.kernel.org/kvm/ZivazWQw1oCU8VBC@google.com/ [1] Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - New patch --- arch/x86/kvm/mmu/tdp_mmu.c | 12 +++++++++++- arch/x86/kvm/mmu/tdp_mmu.h | 14 ++++++++++---- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 7af395073e92..8914c5b0d5ab 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -95,10 +95,20 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) static bool tdp_mmu_root_match(struct kvm_mmu_page *root, enum kvm_tdp_mmu_root_types types) { + if (WARN_ON_ONCE(types == BUGGY_KVM_ROOTS)) + return false; + if (WARN_ON_ONCE(!(types & (KVM_SHARED_ROOTS | KVM_PRIVATE_ROOTS)))) + return false; + if ((types & KVM_VALID_ROOTS) && root->role.invalid) return false; - return true; + if ((types & KVM_SHARED_ROOTS) && !is_private_sp(root)) + return true; + if ((types & KVM_PRIVATE_ROOTS) && is_private_sp(root)) + return true; + + return false; } /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 30f2ab88a642..6a65498b481c 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -20,12 +20,18 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root); enum kvm_tdp_mmu_root_types { - KVM_VALID_ROOTS = BIT(0), - - KVM_ANY_ROOTS = 0, - KVM_ANY_VALID_ROOTS = KVM_VALID_ROOTS, + BUGGY_KVM_ROOTS = BUGGY_KVM_INVALIDATION, + KVM_SHARED_ROOTS = KVM_PROCESS_SHARED, + KVM_PRIVATE_ROOTS = KVM_PROCESS_PRIVATE, + KVM_VALID_ROOTS = BIT(2), + KVM_ANY_VALID_ROOTS = KVM_SHARED_ROOTS | KVM_PRIVATE_ROOTS | KVM_VALID_ROOTS, + KVM_ANY_ROOTS = KVM_SHARED_ROOTS | KVM_PRIVATE_ROOTS, }; +static_assert(!(KVM_SHARED_ROOTS & KVM_VALID_ROOTS)); +static_assert(!(KVM_PRIVATE_ROOTS & KVM_VALID_ROOTS)); +static_assert(KVM_PRIVATE_ROOTS == (KVM_SHARED_ROOTS << 1)); + bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush); bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm); From patchwork Wed May 15 00:59:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664506 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FAED383BE; Wed, 15 May 2024 01:00:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734815; cv=none; b=jAFp81kyV/2VTG1nVT1n2dIL4ewSSKjzeZ1+DZc4iMUWjqs3fQV1hT27EU7D/GnvFngRogoMVgzBjPquK5/sNkkJQhgkGbJ1wMHlSRlRL7fJ63E35ziaBgYvr/33EArwu6KYeR2kcTYYrYisQYArtcDc5TwuIF7r2zP+GrEyMWo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734815; c=relaxed/simple; bh=YcYrPvm62cI+4A1/6iEWc+sBYrpX+wYdWWAThta5Qro=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mu+ejgMSIc0m3S1pwwvVtUgPgneshzVyRUhN4GxxtqajzXrT7bMsw+I+kgQbbYQrxuIf670X/HPvqkd3DigIf8eWNQ45oOS8tbLg6Co2WMrs2+qCpullv8tQKSkLirjS6UsuIY48TMg3EVGXMErngPn403vUppw5t1XUFX8Knts= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=N/832aU/; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="N/832aU/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734814; x=1747270814; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YcYrPvm62cI+4A1/6iEWc+sBYrpX+wYdWWAThta5Qro=; b=N/832aU/jx5M5XU4K0OWyY7yufcqL8XkRl/GGhRZLBS2S+Y9bIiHMdOt 570c059o3WKshg50Ed9KC08TK5IKuteWXys+HGtV5zSiMhgo68sDyGANt lTAY1Y14/jE3ipLboyeKr2s1HXKalQd3ZlpkxWNyH6Qtk07VQaMN9nFoP KTpBj37RGNfjRBSgVT8Kt8W1IK+qP5YCzUHGidsoYXOhP5nwOJDpzUKOg z8u5sZJOqRCPmTft2yfF8p/6DJ7QLFiaXCCqAGpk3EIUsfS/XrdGoV1xd 2R2LhL2UaEvVTXkR7Ub8yv61Lt4TgCO2wi8WOKdZqkKuBcaR8EtDen4Ah w==; X-CSE-ConnectionGUID: 74Hcq4I6Qg6EDRa60m1qFQ== X-CSE-MsgGUID: rL4IxInCRKKRGbknql24oA== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613990" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613990" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:08 -0700 X-CSE-ConnectionGUID: kG7EbLaVSrepTn16J4iMxg== X-CSE-MsgGUID: 4BA517K8RgukFZNAhdnQcQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942837" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:07 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 14/16] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots() Date: Tue, 14 May 2024 17:59:50 -0700 Message-Id: <20240515005952.3410568-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Rename kvm_tdp_mmu_invalidate_all_roots() to kvm_tdp_mmu_invalidate_roots(), and make it enum kvm_tdp_mmu_root_types as an argument. Have the callers only invalidate the required roots instead of all roots. Suggested-by: Chao Gao Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - New patch --- arch/x86/kvm/mmu/mmu.c | 9 +++++++-- arch/x86/kvm/mmu/tdp_mmu.c | 5 +++-- arch/x86/kvm/mmu/tdp_mmu.h | 3 ++- 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2506d6277818..338628094ad7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6414,8 +6414,13 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) * write and in the same critical section as making the reload request, * e.g. before kvm_zap_obsolete_pages() could drop mmu_lock and yield. */ - if (tdp_mmu_enabled) - kvm_tdp_mmu_invalidate_all_roots(kvm); + if (tdp_mmu_enabled) { + /* + * The private page tables doesn't support fast zapping. The + * caller should handle it by other way. + */ + kvm_tdp_mmu_invalidate_roots(kvm, KVM_SHARED_ROOTS); + } /* * Notify all vcpus to reload its shadow page table and flush TLB. diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 8914c5b0d5ab..eb88af48c8f0 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -37,7 +37,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) * for zapping and thus puts the TDP MMU's reference to each root, i.e. * ultimately frees all roots. */ - kvm_tdp_mmu_invalidate_all_roots(kvm); + kvm_tdp_mmu_invalidate_roots(kvm, KVM_ANY_ROOTS); kvm_tdp_mmu_zap_invalidated_roots(kvm); WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages)); @@ -1170,7 +1170,8 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm) * Note, kvm_tdp_mmu_zap_invalidated_roots() is gifted the TDP MMU's reference. * See kvm_tdp_mmu_alloc_root(). */ -void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) +void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm, + enum kvm_tdp_mmu_root_types types) { struct kvm_mmu_page *root; diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 6a65498b481c..b8a967426fac 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -35,7 +35,8 @@ static_assert(KVM_PRIVATE_ROOTS == (KVM_SHARED_ROOTS << 1)); bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush); bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); void kvm_tdp_mmu_zap_all(struct kvm *kvm); -void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); +void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm, + enum kvm_tdp_mmu_root_types types); void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm); int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); From patchwork Wed May 15 00:59:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664507 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CEB239847; Wed, 15 May 2024 01:00:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734815; cv=none; b=NL0n0AEL5re1ERlbALnxky0vcsAFB45RfTYPBkBYK0BW4xduT792ZCrTN1ZEDnRlmuZ5ZJqdFlnUmDVh+d3Xd9VPWtG202g6/0+LnjnPGELdmWPwPF3TUOib6jXl5m3dLQp+BXebTcCqxe2D7xtSqy3qlDegdactnsq8QUDk+sE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734815; c=relaxed/simple; bh=ncuxWk4n+cq8Op23zgOE/XZqoYu8BbW/SQHwcjWe9Yk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=I2lodamewsTpymNXLWxzHNT8/kn53yvd/lgKDaE0SCLpfppAsVusrTZxHhCejoHD19714ft1auNnhq1vWPx9H+a8E5lSCD8jiNtI3evk12RS/JkuWXcs6iDg+yG2q8RlWhcp5O0Lfe8wKd/BLxYEmPAvVKkc6ZGAbS45FCBQOVQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OUaaLWMV; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OUaaLWMV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734814; x=1747270814; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ncuxWk4n+cq8Op23zgOE/XZqoYu8BbW/SQHwcjWe9Yk=; b=OUaaLWMVcXZBsphfIUntX13iQ8oMA/cEf38S7JzYjAX5+9xCzQbm3Up/ vaw3X5Gmj8lanBGfp05mK0M5gsJoHx5JQVPFHS/ahL82EA2YWGLGceNV5 z8E06nJNiZhrPB2a1bWHW0APnn9sr4lCOk4/1/4f4Ih46MvcGRmhqQxGB Aaynz9hyNZC4kyw+6qlPTJqmlC3Ev1YeNrxy6SdyDWJ6jLMEecKbFFBmi HoqAmU368ncKgu/roTkbKty/lSt45cwBOd4+X2yoRY2WDNUAP/2BqG+HR O1uPPzDxBuZl+7bWdFM+LUi1D26V4l9QoubpXaOP7U3vgjSQSRGUxX2lx A==; X-CSE-ConnectionGUID: 0rgenTWKQ3OPWcvSoiPrsw== X-CSE-MsgGUID: LrHh3l5uS8ezYxp+Ud+TfQ== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613994" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613994" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:09 -0700 X-CSE-ConnectionGUID: dn1mAM6GR0exhGcx8v+U5A== X-CSE-MsgGUID: bNhNrPqHSdqaemDkCQmK6w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942853" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:08 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 15/16] KVM: x86/tdp_mmu: Make mmu notifier callbacks to check kvm_process Date: Tue, 14 May 2024 17:59:51 -0700 Message-Id: <20240515005952.3410568-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Isaku Yamahata Teach the MMU notifier callbacks how to check kvm_gfn_range.process to filter which KVM MMU root types to operate on. The private GPAs are backed by guest memfd. Such memory is not subjected to MMU notifier callbacks because it can't be mapped into the host user address space. Now kvm_gfn_range conveys info about which root to operate on. Enhance the callback to filter the root page table type. The KVM MMU notifier comes down to two functions. kvm_tdp_mmu_unmap_gfn_range() and kvm_tdp_mmu_handle_gfn(). For VM's without a private/shared split in the EPT, all operations should target the normal(shared) root. Adjust the target roots based on kvm_gfn_shared_mask(). invalidate_range_start() comes into kvm_tdp_mmu_unmap_gfn_range(). invalidate_range_end() doesn't come into arch code. Signed-off-by: Isaku Yamahata Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - Remove warning (Rick) - Remove confusing mention of mapping flags (Chao) - Re-write coverletter v19: - type: test_gfn() => test_young() v18: - newly added --- arch/x86/kvm/mmu/tdp_mmu.c | 40 +++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index eb88af48c8f0..af61d131d2dc 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1396,12 +1396,32 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return ret; } +static enum kvm_tdp_mmu_root_types kvm_process_to_root_types(struct kvm *kvm, + enum kvm_process process) +{ + WARN_ON_ONCE(process == BUGGY_KVM_INVALIDATION); + + /* Always process shared for cases where private is not on a separate root */ + if (!kvm_gfn_shared_mask(kvm)) { + process |= KVM_PROCESS_SHARED; + process &= ~KVM_PROCESS_PRIVATE; + } + + return (enum kvm_tdp_mmu_root_types)process; +} + +/* Used by mmu notifier via kvm_unmap_gfn_range() */ bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, bool flush) { + enum kvm_tdp_mmu_root_types types = kvm_process_to_root_types(kvm, range->process); struct kvm_mmu_page *root; - __for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id, KVM_ANY_ROOTS) + /* kvm_process_to_root_types() has WARN_ON_ONCE(). Don't warn it again. */ + if (types == BUGGY_KVM_ROOTS) + return flush; + + __for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id, types) flush = tdp_mmu_zap_leafs(kvm, root, range->start, range->end, range->may_block, flush); @@ -1415,18 +1435,32 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm, struct kvm_gfn_range *range, tdp_handler_t handler) { + enum kvm_tdp_mmu_root_types types = kvm_process_to_root_types(kvm, range->process); struct kvm_mmu_page *root; struct tdp_iter iter; bool ret = false; + if (types == BUGGY_KVM_ROOTS) + return ret; + /* * Don't support rescheduling, none of the MMU notifiers that funnel * into this helper allow blocking; it'd be dead, wasteful code. */ - for_each_tdp_mmu_root(kvm, root, range->slot->as_id) { + __for_each_tdp_mmu_root(kvm, root, range->slot->as_id, types) { + gfn_t start, end; + + /* + * For TDX shared mapping, set GFN shared bit to the range, + * so the handler() doesn't need to set it, to avoid duplicated + * code in multiple handler()s. + */ + start = kvm_gfn_for_root(kvm, root, range->start); + end = kvm_gfn_for_root(kvm, root, range->end); + rcu_read_lock(); - tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) + tdp_root_for_each_leaf_pte(iter, root, start, end) ret |= handler(kvm, &iter, range); rcu_read_unlock(); From patchwork Wed May 15 00:59:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rick Edgecombe X-Patchwork-Id: 13664508 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7491939FD3; Wed, 15 May 2024 01:00:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734816; cv=none; b=kyqbo/R2V7QLWDz9iaY6JDtVuBFg1YoaeEJM4d5xKoQrDr7hXQv80I21mrgixFeaq5oI/JQLDCW0b3ouOe6f3J6VihefeV00lOAfWC2KSEsilKj43/WX5uYlVkqWP5GGMkxcoHR9tXu7NpUcXpQXlCR3cz9SOwnqxxbwXeJB2sI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715734816; c=relaxed/simple; bh=aj7fEmDeepMPOarJYfKiRSUouqqIYmoqtkZdy4tUeqg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aJpoZLeYkJ+RSxYV6cpe2AyT6nvktSyB7PRzT4ZmLmxlGTjAOHeaAmVr8FS+eRnXqfm7wPVFRzpUdRBaDKhXM6VLRcmC2RB8oumyM1hfnX9V2k+6kQ9zHDp/fJALV8RMSTspGKZkuy/E4C2jREPiCCsZhRqu+iTveUIJ6dasV7I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QdpfjLYS; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QdpfjLYS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715734815; x=1747270815; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aj7fEmDeepMPOarJYfKiRSUouqqIYmoqtkZdy4tUeqg=; b=QdpfjLYStE+yiWVUfX3BnQ0+hygbzbbp3hpbUA9trrdgwXR6Ja3MDx5W tpF4W+udcuMJntxVNYdPIzJU83YMZ7+e/cL61I1zPrM4zz8Exjr64llJO SNBY4gm/hIOF/G6zseqL+v/Q0ipupi1jK2juwBzxdCP2KHqu69Ch1NPjE oeaSBEDOY5dPJlbFxe8FX8/rG5MITBnLsCYPeiO8/6Bt2sINolOkucydM E3kW6ULyXORt1ABQ1UmroGazathgm9SRRqdQPudNE1bV32ciw0lAb6sGl soMkkPO93PxFsig1qGMQ/TkUdgEarKMUoFOa27eGL2/QssKjUsBCTB3qp Q==; X-CSE-ConnectionGUID: a8KbWV+hRqWXNTA0oiD8Zg== X-CSE-MsgGUID: gQZq+WRRSbuY1akc02rQQg== X-IronPort-AV: E=McAfee;i="6600,9927,11073"; a="11613999" X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="11613999" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:10 -0700 X-CSE-ConnectionGUID: TFQG3a5BRwWlS7yGvC+bXg== X-CSE-MsgGUID: awm+P7luTueABNtx8YYR7g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,160,1712646000"; d="scan'208";a="30942860" Received: from oyildiz-mobl1.amr.corp.intel.com (HELO rpedgeco-desk4.intel.com) ([10.209.51.34]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 May 2024 18:00:08 -0700 From: Rick Edgecombe To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, erdemaktas@google.com, sagis@google.com, yan.y.zhao@intel.com, dmatlack@google.com, rick.p.edgecombe@intel.com Subject: [PATCH 16/16] KVM: x86/tdp_mmu: Invalidate correct roots Date: Tue, 14 May 2024 17:59:52 -0700 Message-Id: <20240515005952.3410568-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> References: <20240515005952.3410568-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sean Christopherson When invalidating roots, respect the root type passed. kvm_tdp_mmu_invalidate_roots() is called with different root types. For kvm_mmu_zap_all_fast() it only operates on shared roots. But when tearing down a TD it needs to invalidate all roots. Check the root type in root iterator. Signed-off-by: Sean Christopherson Co-developed-by: Isaku Yamahata Signed-off-by: Isaku Yamahata [evolved quite a bit from original author's patch] Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- TDX MMU Part 1: - Rename from "Don't zap private pages for unsupported cases", and split many parts out. - Don't support MTRR, apic zapping (Rick) - Detangle private/shared alias logic in kvm_tdp_mmu_unmap_gfn_range() (Rick) - Fix TLB flushing bug debugged by (Chao Gao) https://lore.kernel.org/kvm/Zh8yHEiOKyvZO+QR@chao-email/ - Split out MTRR part - Use enum based root iterators (Sean) - Reorder logic in kvm_mmu_zap_memslot_leafs(). - Replace skip_private with enum kvm_tdp_mmu_root_type. --- arch/x86/kvm/mmu/tdp_mmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index af61d131d2dc..42ccafc7deff 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1196,6 +1196,9 @@ void kvm_tdp_mmu_invalidate_roots(struct kvm *kvm, * or get/put references to roots. */ list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { + if (!tdp_mmu_root_match(root, types)) + continue; + /* * Note, invalid roots can outlive a memslot update! Invalid * roots must be *zapped* before the memslot update completes,