diff mbox series

[v3,kvm/queue,11/16] KVM: Add kvm_map_gfn_range

Message ID 20211223123011.41044-12-chao.p.peng@linux.intel.com (mailing list archive)
State New
Headers show
Series KVM: mm: fd-based approach for supporting KVM guest private memory | expand

Commit Message

Chao Peng Dec. 23, 2021, 12:30 p.m. UTC
This new function establishes the mapping in KVM page tables for a
given gfn range. It can be used in the memory fallocate callback for
memfd based memory to establish the mapping for KVM secondary MMU when
the pages are allocated in the memory backend.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 arch/x86/kvm/mmu/mmu.c   | 47 ++++++++++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c      |  5 +++++
 3 files changed, 54 insertions(+)

Comments

Sean Christopherson Dec. 23, 2021, 6:06 p.m. UTC | #1
On Thu, Dec 23, 2021, Chao Peng wrote:
> This new function establishes the mapping in KVM page tables for a
> given gfn range. It can be used in the memory fallocate callback for
> memfd based memory to establish the mapping for KVM secondary MMU when
> the pages are allocated in the memory backend.

NAK, under no circumstance should KVM install SPTEs in response to allocating
memory in a file.   The correct thing to do is to invalidate the gfn range
associated with the newly mapped range, i.e. wipe out any shared SPTEs associated
with the memslot.
Chao Peng Dec. 24, 2021, 4:13 a.m. UTC | #2
On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote:
> On Thu, Dec 23, 2021, Chao Peng wrote:
> > This new function establishes the mapping in KVM page tables for a
> > given gfn range. It can be used in the memory fallocate callback for
> > memfd based memory to establish the mapping for KVM secondary MMU when
> > the pages are allocated in the memory backend.
> 
> NAK, under no circumstance should KVM install SPTEs in response to allocating
> memory in a file.   The correct thing to do is to invalidate the gfn range
> associated with the newly mapped range, i.e. wipe out any shared SPTEs associated
> with the memslot.

Right, thanks.
Chao Peng Dec. 31, 2021, 2:33 a.m. UTC | #3
On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote:
> On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote:
> > On Thu, Dec 23, 2021, Chao Peng wrote:
> > > This new function establishes the mapping in KVM page tables for a
> > > given gfn range. It can be used in the memory fallocate callback for
> > > memfd based memory to establish the mapping for KVM secondary MMU when
> > > the pages are allocated in the memory backend.
> > 
> > NAK, under no circumstance should KVM install SPTEs in response to allocating
> > memory in a file.   The correct thing to do is to invalidate the gfn range
> > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated
> > with the memslot.
> 
> Right, thanks.

BTW, I think the current fallocate() callback is just useless as long as
we don't want to install KVM SPTEs in response to allocating memory in a
file. The invalidation of the shared SPTEs should be notified through 
mmu_notifier of the shared memory backend, not memfd_notifier of the
private memory backend.

Thanks,
Chao
Sean Christopherson Jan. 4, 2022, 5:31 p.m. UTC | #4
On Fri, Dec 31, 2021, Chao Peng wrote:
> On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote:
> > On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote:
> > > On Thu, Dec 23, 2021, Chao Peng wrote:
> > > > This new function establishes the mapping in KVM page tables for a
> > > > given gfn range. It can be used in the memory fallocate callback for
> > > > memfd based memory to establish the mapping for KVM secondary MMU when
> > > > the pages are allocated in the memory backend.
> > > 
> > > NAK, under no circumstance should KVM install SPTEs in response to allocating
> > > memory in a file.   The correct thing to do is to invalidate the gfn range
> > > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated
> > > with the memslot.
> > 
> > Right, thanks.
> 
> BTW, I think the current fallocate() callback is just useless as long as
> we don't want to install KVM SPTEs in response to allocating memory in a
> file. The invalidation of the shared SPTEs should be notified through 
> mmu_notifier of the shared memory backend, not memfd_notifier of the
> private memory backend.

No, because the private fd is the final source of truth as to whether or not a
GPA is private, e.g. userspace may choose to not unmap the shared backing.
KVM's rule per Paolo's/this proposoal is that a GPA is private if it has a private
memslot and is present in the private backing store.  And the other core rule is
that KVM must never map both the private and shared variants of a GPA into the
guest.
Chao Peng Jan. 5, 2022, 6:14 a.m. UTC | #5
On Tue, Jan 04, 2022 at 05:31:30PM +0000, Sean Christopherson wrote:
> On Fri, Dec 31, 2021, Chao Peng wrote:
> > On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote:
> > > On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote:
> > > > On Thu, Dec 23, 2021, Chao Peng wrote:
> > > > > This new function establishes the mapping in KVM page tables for a
> > > > > given gfn range. It can be used in the memory fallocate callback for
> > > > > memfd based memory to establish the mapping for KVM secondary MMU when
> > > > > the pages are allocated in the memory backend.
> > > > 
> > > > NAK, under no circumstance should KVM install SPTEs in response to allocating
> > > > memory in a file.   The correct thing to do is to invalidate the gfn range
> > > > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated
> > > > with the memslot.
> > > 
> > > Right, thanks.
> > 
> > BTW, I think the current fallocate() callback is just useless as long as
> > we don't want to install KVM SPTEs in response to allocating memory in a
> > file. The invalidation of the shared SPTEs should be notified through 
> > mmu_notifier of the shared memory backend, not memfd_notifier of the
> > private memory backend.
> 
> No, because the private fd is the final source of truth as to whether or not a
> GPA is private, e.g. userspace may choose to not unmap the shared backing.
> KVM's rule per Paolo's/this proposoal is that a GPA is private if it has a private
> memslot and is present in the private backing store.  And the other core rule is
> that KVM must never map both the private and shared variants of a GPA into the
> guest.

That's true, but I'm wondering if zapping the shared variant can be
handled at the time when the private one gets mapped in the KVM page
fault. No bothering the backing store to dedicate a callback to tell
KVM.

Chao
Sean Christopherson Jan. 5, 2022, 5:03 p.m. UTC | #6
On Wed, Jan 05, 2022, Chao Peng wrote:
> On Tue, Jan 04, 2022 at 05:31:30PM +0000, Sean Christopherson wrote:
> > On Fri, Dec 31, 2021, Chao Peng wrote:
> > > On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote:
> > > > On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote:
> > > > > On Thu, Dec 23, 2021, Chao Peng wrote:
> > > > > > This new function establishes the mapping in KVM page tables for a
> > > > > > given gfn range. It can be used in the memory fallocate callback for
> > > > > > memfd based memory to establish the mapping for KVM secondary MMU when
> > > > > > the pages are allocated in the memory backend.
> > > > > 
> > > > > NAK, under no circumstance should KVM install SPTEs in response to allocating
> > > > > memory in a file.   The correct thing to do is to invalidate the gfn range
> > > > > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated
> > > > > with the memslot.
> > > > 
> > > > Right, thanks.
> > > 
> > > BTW, I think the current fallocate() callback is just useless as long as
> > > we don't want to install KVM SPTEs in response to allocating memory in a
> > > file. The invalidation of the shared SPTEs should be notified through 
> > > mmu_notifier of the shared memory backend, not memfd_notifier of the
> > > private memory backend.
> > 
> > No, because the private fd is the final source of truth as to whether or not a
> > GPA is private, e.g. userspace may choose to not unmap the shared backing.
> > KVM's rule per Paolo's/this proposoal is that a GPA is private if it has a private
> > memslot and is present in the private backing store.  And the other core rule is
> > that KVM must never map both the private and shared variants of a GPA into the
> > guest.
> 
> That's true, but I'm wondering if zapping the shared variant can be
> handled at the time when the private one gets mapped in the KVM page
> fault. No bothering the backing store to dedicate a callback to tell
> KVM.

Hmm, I don't think that would work for the TDP MMU due to page faults taking
mmu_lock for read.  E.g. if two vCPUs concurrently fault in both the shared and
private variants, a race could exist where the private page fault sees the gfn
as private and the shared page fault sees it as shared.  In that case, both faults
will install a SPTE and KVM would end up running with both variants mapped into the
guest.

There's also a performance penalty, as KVM would need to walk the shared EPT tree
on every private page fault.
Chao Peng Jan. 6, 2022, 12:35 p.m. UTC | #7
On Wed, Jan 05, 2022 at 05:03:23PM +0000, Sean Christopherson wrote:
> On Wed, Jan 05, 2022, Chao Peng wrote:
> > On Tue, Jan 04, 2022 at 05:31:30PM +0000, Sean Christopherson wrote:
> > > On Fri, Dec 31, 2021, Chao Peng wrote:
> > > > On Fri, Dec 24, 2021 at 12:13:51PM +0800, Chao Peng wrote:
> > > > > On Thu, Dec 23, 2021 at 06:06:19PM +0000, Sean Christopherson wrote:
> > > > > > On Thu, Dec 23, 2021, Chao Peng wrote:
> > > > > > > This new function establishes the mapping in KVM page tables for a
> > > > > > > given gfn range. It can be used in the memory fallocate callback for
> > > > > > > memfd based memory to establish the mapping for KVM secondary MMU when
> > > > > > > the pages are allocated in the memory backend.
> > > > > > 
> > > > > > NAK, under no circumstance should KVM install SPTEs in response to allocating
> > > > > > memory in a file.   The correct thing to do is to invalidate the gfn range
> > > > > > associated with the newly mapped range, i.e. wipe out any shared SPTEs associated
> > > > > > with the memslot.
> > > > > 
> > > > > Right, thanks.
> > > > 
> > > > BTW, I think the current fallocate() callback is just useless as long as
> > > > we don't want to install KVM SPTEs in response to allocating memory in a
> > > > file. The invalidation of the shared SPTEs should be notified through 
> > > > mmu_notifier of the shared memory backend, not memfd_notifier of the
> > > > private memory backend.
> > > 
> > > No, because the private fd is the final source of truth as to whether or not a
> > > GPA is private, e.g. userspace may choose to not unmap the shared backing.
> > > KVM's rule per Paolo's/this proposoal is that a GPA is private if it has a private
> > > memslot and is present in the private backing store.  And the other core rule is
> > > that KVM must never map both the private and shared variants of a GPA into the
> > > guest.
> > 
> > That's true, but I'm wondering if zapping the shared variant can be
> > handled at the time when the private one gets mapped in the KVM page
> > fault. No bothering the backing store to dedicate a callback to tell
> > KVM.
> 
> Hmm, I don't think that would work for the TDP MMU due to page faults taking
> mmu_lock for read.  E.g. if two vCPUs concurrently fault in both the shared and
> private variants, a race could exist where the private page fault sees the gfn
> as private and the shared page fault sees it as shared.  In that case, both faults
> will install a SPTE and KVM would end up running with both variants mapped into the
> guest.
> 
> There's also a performance penalty, as KVM would need to walk the shared EPT tree
> on every private page fault.

Make sense.

Thanks,
Chao
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1d275e9d76b5..2856eb662a21 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1568,6 +1568,53 @@  static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
 	return ret;
 }
 
+bool kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	struct kvm_vcpu *vcpu;
+	kvm_pfn_t pfn;
+	gfn_t gfn;
+	int idx;
+	bool ret = true;
+
+	/* Need vcpu context for kvm_mmu_do_page_fault. */
+	vcpu = kvm_get_vcpu(kvm, 0);
+	if (mutex_lock_killable(&vcpu->mutex))
+		return false;
+
+	vcpu_load(vcpu);
+	idx = srcu_read_lock(&kvm->srcu);
+
+	kvm_mmu_reload(vcpu);
+
+	gfn = range->start;
+	while (gfn < range->end) {
+		if (signal_pending(current)) {
+			ret = false;
+			break;
+		}
+
+		if (need_resched())
+			cond_resched();
+
+		pfn = kvm_mmu_do_page_fault(vcpu, gfn << PAGE_SHIFT,
+					PFERR_WRITE_MASK | PFERR_USER_MASK,
+					false);
+		if (is_error_noslot_pfn(pfn) || kvm->vm_bugged) {
+			ret = false;
+			break;
+		}
+
+		gfn++;
+	}
+
+	srcu_read_unlock(&kvm->srcu, idx);
+	vcpu_put(vcpu);
+
+	mutex_unlock(&vcpu->mutex);
+
+	return ret;
+}
+
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	bool flush = false;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index be567925831b..8c2359175509 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -241,6 +241,8 @@  struct kvm_gfn_range {
 	pte_t pte;
 	bool may_block;
 };
+
+bool kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f495c1a313bd..660ce15973ad 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -471,6 +471,11 @@  EXPORT_SYMBOL_GPL(kvm_destroy_vcpus);
 #if defined(CONFIG_MEMFD_OPS) ||\
 	(defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER))
 
+bool __weak kvm_map_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	return false;
+}
+
 typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
 
 typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start,