KVM: x86/mmu: Implement memslot deletion for TDX

Message ID	20240620193701.374519-1-rick.p.edgecombe@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 143AF1B9AD6; Thu, 20 Jun 2024 19:37:05 +0000 (UTC) From: Rick Edgecombe <rick.p.edgecombe@intel.com> To: yan.y.zhao@intel.com Cc: dmatlack@google.com, erdemaktas@google.com, isaku.yamahata@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, rick.p.edgecombe@intel.com, sagis@google.com, seanjc@google.com Subject: [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX Date: Thu, 20 Jun 2024 12:37:01 -0700 Message-Id: <20240620193701.374519-1-rick.p.edgecombe@intel.com> In-Reply-To: <20240613060708.11761-1-yan.y.zhao@intel.com> References: <20240613060708.11761-1-yan.y.zhao@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	KVM: x86/mmu: Implement memslot deletion for TDX \| expand KVM: x86/mmu: Implement memslot deletion for TDX

Message ID

20240620193701.374519-1-rick.p.edgecombe@intel.com (mailing list archive)

State

New, archived

Headers

From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: yan.y.zhao@intel.com
Cc: dmatlack@google.com,
	erdemaktas@google.com,
	isaku.yamahata@intel.com,
	kai.huang@intel.com,
	kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	pbonzini@redhat.com,
	rick.p.edgecombe@intel.com,
	sagis@google.com,
	seanjc@google.com
Subject: [PATCH] KVM: x86/mmu: Implement memslot deletion for TDX
Date: Thu, 20 Jun 2024 12:37:01 -0700
Message-Id: <20240620193701.374519-1-rick.p.edgecombe@intel.com>
In-Reply-To: <20240613060708.11761-1-yan.y.zhao@intel.com>
References: <20240613060708.11761-1-yan.y.zhao@intel.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

KVM: x86/mmu: Implement memslot deletion for TDX | expand

Commit Message

Rick Edgecombe June 20, 2024, 7:37 p.m. UTC

Force TDX VMs to use the KVM_X86_QUIRK_SLOT_ZAP_ALL behavior.

TDs cannot use the fast zapping operation to implement memslot deletion for
a couple reasons:
1. KVM cannot zap TDX private PTEs and re-fault them without coordinating
   with the guest. This is due to the TDs needing to "accept" memory. So an
   operation to delete a memslot needs to limit the private zapping to the
   range of the memslot.
2. For reason (1), kvm_mmu_zap_all_fast() is limited to direct (shared)
   roots. This means it will not zap the mirror (private) PTEs. If a
   memslot is deleted with private memory mapped, the private memory would
   remain mapped in the TD. Then if later the gmem fd was whole punched,
   the pages could be freed on the host while still mapped in the TD. This
   is because that operation would no longer have the memslot to map the
   pgoff to the gfn.

To handle the first case, userspace could simply set the
KVM_X86_QUIRK_SLOT_ZAP_ALL quirk for TDs. This would prevent the issue in
(1), but it is not sufficient to resolve (2) because the problems there
extend beyond the userspace's TD, to affecting the rest of the host. So the
zap-leafs-only behavior is required for both

A couple options were considered, including forcing
KVM_X86_QUIRK_SLOT_ZAP_ALL to always be on for TDs, however due to the
currently limited quirks interface (no way to query quirks, or force them
to be disabled), this would require developing additional interfaces. So
instead just do the simple thing and make TDs always do the zap-leafs
behavior like when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled.

While at it, have the new behavior apply to all non-KVM_X86_DEFAULT_VM VMs,
as the previous behavior was not ideal (see [0]). It is assumed until
proven otherwise that the other VM types will not be exposed to the bug[1]
that derailed that effort.

Memslot deletion needs to zap both the private and shared mappings of a
GFN, so update the attr_filter field in kvm_mmu_zap_memslot_leafs() to
include both.

Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Link: https://lore.kernel.org/kvm/20190205205443.1059-1-sean.j.christopherson@intel.com/ [0]
Link: https://patchwork.kernel.org/project/kvm/patch/20190205210137.1377-11-sean.j.christopherson@intel.com [1]
---

Here is the patch for TDX integration. It is not needed until we can
actually create KVM_X86_TDX_VMs.

Admittedly, this kind of combines two changes, but the amount of code is
very small so I left it as one patch.

 arch/x86/kvm/mmu.h     | 6 ++++++
 arch/x86/kvm/mmu/mmu.c | 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Rick Edgecombe June 20, 2024, 7:40 p.m. UTC | #1

On Thu, 2024-06-20 at 12:37 -0700, Rick Edgecombe wrote:
> Here is the patch for TDX integration. It is not needed until we can
> actually create KVM_X86_TDX_VMs.

It depends on this series as well:
https://lore.kernel.org/kvm/e693adab-9fa3-47fd-b62f-c3f2589ffe7f@linux.intel.com/

So it should not be included when applying just this series (will get a build
error).

Sean Christopherson June 21, 2024, 12:08 a.m. UTC | #2

On Thu, Jun 20, 2024, Rick Edgecombe wrote:
> Force TDX VMs to use the KVM_X86_QUIRK_SLOT_ZAP_ALL behavior.
> 
> TDs cannot use the fast zapping operation to implement memslot deletion for
> a couple reasons:
> 1. KVM cannot zap TDX private PTEs and re-fault them without coordinating

Uber nit, this isn't strictly true, for KVM's definition of "zap" (which is fuzzy
and overloaded).  KVM _could_ zap and re-fault *leaf* PTEs, e.g. BLOCK+UNBLOCK.
It's specifically the full teardown and rebuild of the "fast zap" that doesn't
play nice, as the non-leaf S-EPT entries *must* be preserved due to how the TDX
module does is refcounting.

Rick Edgecombe June 21, 2024, 1:17 a.m. UTC | #3

On Thu, 2024-06-20 at 17:08 -0700, Sean Christopherson wrote:
> On Thu, Jun 20, 2024, Rick Edgecombe wrote:
> > Force TDX VMs to use the KVM_X86_QUIRK_SLOT_ZAP_ALL behavior.
> > 
> > TDs cannot use the fast zapping operation to implement memslot deletion for
> > a couple reasons:
> > 1. KVM cannot zap TDX private PTEs and re-fault them without coordinating
> 
> Uber nit, this isn't strictly true, for KVM's definition of "zap" (which is
> fuzzy
> and overloaded).  KVM _could_ zap and re-fault *leaf* PTEs, e.g.
> BLOCK+UNBLOCK.
> It's specifically the full teardown and rebuild of the "fast zap" that doesn't
> play nice, as the non-leaf S-EPT entries *must* be preserved due to how the
> TDX
> module does is refcounting.

Hmm, yea. That is probably worth an update. I'll change it for when I post
another version of this patch.

I was imaging this series might go up ahead of the rest of the MMU prep stuff.
In which case I can just post a new version of this patch on top of kvm-coco-
queue once this series appears in the base of that branch. Assuming there is no
problems with that, I won't post a v2 right away.

Yan Zhao June 21, 2024, 2:14 a.m. UTC | #4

On Thu, Jun 20, 2024 at 05:08:37PM -0700, Sean Christopherson wrote:
> On Thu, Jun 20, 2024, Rick Edgecombe wrote:
> > Force TDX VMs to use the KVM_X86_QUIRK_SLOT_ZAP_ALL behavior.
> > 
> > TDs cannot use the fast zapping operation to implement memslot deletion for
> > a couple reasons:
> > 1. KVM cannot zap TDX private PTEs and re-fault them without coordinating
> 
> Uber nit, this isn't strictly true, for KVM's definition of "zap" (which is fuzzy
> and overloaded).  KVM _could_ zap and re-fault *leaf* PTEs, e.g. BLOCK+UNBLOCK.
> It's specifically the full teardown and rebuild of the "fast zap" that doesn't
> play nice, as the non-leaf S-EPT entries *must* be preserved due to how the TDX
> module does is refcounting.
Actually, slower zap all for TDX was also considered. e.g.
1) fully dropping of the leaf entries within the range of memslot
2) just blocking all other leaf entries
But given current TDX MMU series does not provide block-only implementation,
that proposal was aborted internally :)

BTW, there's another proposal, not sure if you will consider it.
We can still have users to control whether the quirk is enabled or not.
(i.e. KVM does not enforce quirk disabling).
If user does not disable the quirk for TDX, then kvm_mmu_zap_all_fast()
will be called to invalidate all shared EPTs.
Meanwhile, could we trigger kvm_gmem_invalidate*() in kvm_gmem_unbind()
as well? Then, the private EPT will also be ensured to be zapped before
memslot removal.
The concern for this approach is that we need to do some similar invalidation
stuffs when in future memslots besides gmem can also be mapped into private EPT.

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 7b12ba761c51..72ed6c07719a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -335,4 +335,10 @@  static inline bool kvm_is_addr_direct(struct kvm *kvm, gpa_t gpa)
 
 	return !gpa_direct_bits || (gpa & gpa_direct_bits);
 }
+
+static inline bool kvm_memslot_flush_zap_all(struct kvm *kvm)
+{
+	return kvm->arch.vm_type == KVM_X86_DEFAULT_VM &&
+	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+}
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 42faad76806a..8212bf77af70 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6993,6 +6993,7 @@  static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *s
 		.start = slot->base_gfn,
 		.end = slot->base_gfn + slot->npages,
 		.may_block = true,
+		.attr_filter = KVM_FILTER_PRIVATE | KVM_FILTER_SHARED,
 	};
 	bool flush = false;
 
@@ -7013,7 +7014,7 @@  static void kvm_mmu_zap_memslot_leafs(struct kvm *kvm, struct kvm_memory_slot *s
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
-	if (kvm_check_has_quirk(kvm, KVM_X86_QUIRK_SLOT_ZAP_ALL))
+	if (kvm_memslot_flush_zap_all(kvm))
 		kvm_mmu_zap_all_fast(kvm);
 	else
 		kvm_mmu_zap_memslot_leafs(kvm, slot);

KVM: x86/mmu: Implement memslot deletion for TDX

Commit Message

Comments

Patch