mbox series

[v3,0/7] KVM: Guest Memory Pre-Population API

Message ID 20240417153450.3608097-1-pbonzini@redhat.com (mailing list archive)
Headers show
Series KVM: Guest Memory Pre-Population API | expand

Message

Paolo Bonzini April 17, 2024, 3:34 p.m. UTC
Pre-population has been requested several times to mitigate KVM page faults
during guest boot or after live migration.  It is also required by TDX
before filling in the initial guest memory with measured contents; while
I am not yet sure that userspace will use this ioctl, if not the code
will be used by a TDX-specific ioctl---to pre-populate the SEPT before
invoking TDH.MEM.PAGE.ADD or TDH.MR.EXTEND.

Compared to Isaku's v2, I have reduced the scope as much as possible:

- no vendor-specific hooks

- just fail if pre-population is invoked while nested virt is access

- just populate page tables for the SMM address space if invoked while
  SMM is active


There are however other changes that affect the userspace API:

- struct name changed to `kvm_map_memory`

- added support for KVM_CHECK_EXTENSION(KVM_MAP_MEMORY) on the VM
  file descriptor, which allows to make this ioctl supported only
  on a subset of VM types

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
- **important**: if EINTR or EAGAIN happens on
  the first page, it is returned.  Otherwise, the ioctl *succeeds*
  but mapping->size is left nonzero.  While this drops the detail
  as to why the system call was interrupted, it is consistent with
  other interruptible system calls such as read().
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Implementation changes:

- the test is not x86-specific anymore (though for now only compiled
  on x86 because no other architectures supports the feature)

- instead of using __weak symbols, the code is conditional on a new
  Kconfig CONFIG_KVM_GENERIC_MAP_MEMORY.


This patch series depends on the other pieces that have been applied
to the kvm-coco-queue branch (and is present on the branch).

Paolo


Isaku Yamahata (7):
  KVM: Document KVM_MAP_MEMORY ioctl
  KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory
  KVM: x86/mmu: Extract __kvm_mmu_do_page_fault()
  KVM: x86/mmu: Make __kvm_mmu_do_page_fault() return mapped level
  KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory
  KVM: x86: Implement kvm_arch_vcpu_map_memory()
  KVM: selftests: x86: Add test for KVM_MAP_MEMORY

 Documentation/virt/kvm/api.rst                |  54 +++++++
 arch/x86/kvm/Kconfig                          |   1 +
 arch/x86/kvm/mmu.h                            |   3 +
 arch/x86/kvm/mmu/mmu.c                        |  32 +++++
 arch/x86/kvm/mmu/mmu_internal.h               |  42 ++++--
 arch/x86/kvm/x86.c                            |  43 ++++++
 include/linux/kvm_host.h                      |   5 +
 include/uapi/linux/kvm.h                      |  10 ++
 tools/include/uapi/linux/kvm.h                |   8 ++
 tools/testing/selftests/kvm/Makefile          |   1 +
 tools/testing/selftests/kvm/map_memory_test.c | 135 ++++++++++++++++++
 virt/kvm/Kconfig                              |   3 +
 virt/kvm/kvm_main.c                           |  61 ++++++++
 13 files changed, 384 insertions(+), 14 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/map_memory_test.c

Comments

Rick Edgecombe April 18, 2024, 12:01 a.m. UTC | #1
On Wed, 2024-04-17 at 11:34 -0400, Paolo Bonzini wrote:
> 
> Compared to Isaku's v2, I have reduced the scope as much as possible:
> 
> - no vendor-specific hooks

The TDX patches build on this, with the vendor callback looking like:

"
int tdx_pre_mmu_map_page(struct kvm_vcpu *vcpu,
			 struct kvm_map_memory *mapping,
			 u64 *error_code)
{
	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
	struct kvm *kvm = vcpu->kvm;

	if (!to_tdx(vcpu)->initialized)
		return -EINVAL;

	/* Shared-EPT case */
	if (!(kvm_is_private_gpa(kvm, mapping->base_address)))
		return 0;

	/* Once TD is finalized, the initial guest memory is fixed. */
	if (is_td_finalized(kvm_tdx))
		return -EINVAL;

	*error_code = TDX_SEPT_PFERR;
	return 0;
}
"

kvm_is_private_gpa() check is already handled in this series.

The initialized and finalized checks are nice guard rails for userspace, but
shouldn't be strictly required.

The TDX_SEPT_PFERR is (PFERR_WRITE_MASK | PFERR_PRIVATE_ACCESS). The
PFERR_WRITE_MASK doesn't seem to be required. Isaku, what was the intention?

But, I think maybe we should add a hook back in the TDX series for the guard
rails.
Paolo Bonzini April 18, 2024, 12:31 a.m. UTC | #2
Il 18 aprile 2024 02:01:03 CEST, "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> ha scritto:
>On Wed, 2024-04-17 at 11:34 -0400, Paolo Bonzini wrote:
>> 
>> Compared to Isaku's v2, I have reduced the scope as much as possible:
>> 
>> - no vendor-specific hooks
>
>The TDX patches build on this, with the vendor callback looking like:
>
>"
>int tdx_pre_mmu_map_page(struct kvm_vcpu *vcpu,
>			 struct kvm_map_memory *mapping,
>			 u64 *error_code)
>{
>	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
>	struct kvm *kvm = vcpu->kvm;
>
>	if (!to_tdx(vcpu)->initialized)
>		return -EINVAL;
>
>	/* Shared-EPT case */
>	if (!(kvm_is_private_gpa(kvm, mapping->base_address)))
>		return 0;
>
>	/* Once TD is finalized, the initial guest memory is fixed. */
>	if (is_td_finalized(kvm_tdx))
>		return -EINVAL;

This is wrong, KVM_MAP_MEMORY should be idempotent. But anyway, you can post what you have on to of kvm-coco-queue (i.e., adding the hook in your patches) and we will sort it out a piece at a time.

Paolo

>
>	*error_code = TDX_SEPT_PFERR;
>	return 0;
>}
>"
>
>kvm_is_private_gpa() check is already handled in this series.
>
>The initialized and finalized checks are nice guard rails for userspace, but
>shouldn't be strictly required.
>
>The TDX_SEPT_PFERR is (PFERR_WRITE_MASK | PFERR_PRIVATE_ACCESS). The
>PFERR_WRITE_MASK doesn't seem to be required. Isaku, what was the intention?
>
>But, I think maybe we should add a hook back in the TDX series for the guard
>rails.

Paolo
Rick Edgecombe April 18, 2024, 12:33 a.m. UTC | #3
On Thu, 2024-04-18 at 02:31 +0200, Paolo Bonzini wrote:
> > The TDX patches build on this, with the vendor callback looking like:
> > 
> > "
> > int tdx_pre_mmu_map_page(struct kvm_vcpu *vcpu,
> >                          struct kvm_map_memory *mapping,
> >                          u64 *error_code)
> > {
> >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> >         struct kvm *kvm = vcpu->kvm;
> > 
> >         if (!to_tdx(vcpu)->initialized)
> >                 return -EINVAL;
> > 
> >         /* Shared-EPT case */
> >         if (!(kvm_is_private_gpa(kvm, mapping->base_address)))
> >                 return 0;
> > 
> >         /* Once TD is finalized, the initial guest memory is fixed. */
> >         if (is_td_finalized(kvm_tdx))
> >                 return -EINVAL;
> 
> This is wrong, KVM_MAP_MEMORY should be idempotent. But anyway, you can post
> what you have on to of kvm-coco-queue (i.e., adding the hook in your patches)
> and we will sort it out a piece at a time.

Hmm, I see your point.