diff mbox series

[1/6] KVM: Document KVM_PRE_FAULT_MEMORY ioctl

Message ID 20240419085927.3648704-2-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: Guest Memory Pre-Population API | expand

Commit Message

Paolo Bonzini April 19, 2024, 8:59 a.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

Adds documentation of KVM_PRE_FAULT_MEMORY ioctl. [1]

It populates guest memory.  It doesn't do extra operations on the
underlying technology-specific initialization [2].  For example,
CoCo-related operations won't be performed.  Concretely for TDX, this API
won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific APIs
are required for such operations.

The key point is to adapt of vcpu ioctl instead of VM ioctl.  First,
populating guest memory requires vcpu.  If it is VM ioctl, we need to pick
one vcpu somehow.  Secondly, vcpu ioctl allows each vcpu to invoke this
ioctl in parallel.  It helps to scale regarding guest memory size, e.g.,
hundreds of GB.

[1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/
[2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-ID: <9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst | 50 ++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

Comments

Isaku Yamahata April 22, 2024, 5:55 p.m. UTC | #1
On Fri, Apr 19, 2024 at 04:59:22AM -0400,
Paolo Bonzini <pbonzini@redhat.com> wrote:

> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Adds documentation of KVM_PRE_FAULT_MEMORY ioctl. [1]
> 
> It populates guest memory.  It doesn't do extra operations on the
> underlying technology-specific initialization [2].  For example,
> CoCo-related operations won't be performed.  Concretely for TDX, this API
> won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific APIs
> are required for such operations.
> 
> The key point is to adapt of vcpu ioctl instead of VM ioctl.  First,
> populating guest memory requires vcpu.  If it is VM ioctl, we need to pick
> one vcpu somehow.  Secondly, vcpu ioctl allows each vcpu to invoke this
> ioctl in parallel.  It helps to scale regarding guest memory size, e.g.,
> hundreds of GB.
> 
> [1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/
> [2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Message-ID: <9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst | 50 ++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..bbcaa5d2b54b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6352,6 +6352,56 @@ a single guest_memfd file, but the bound ranges must not overlap).
>  
>  See KVM_SET_USER_MEMORY_REGION2 for additional details.
>  
> +4.143 KVM_PRE_FAULT_MEMORY
> +------------------------
> +
> +:Capability: KVM_CAP_PRE_FAULT_MEMORY
> +:Architectures: none
> +:Type: vcpu ioctl
> +:Parameters: struct kvm_pre_fault_memory (in/out)
> +:Returns: 0 on success, < 0 on error
> +
> +Errors:
> +
> +  ========== ===============================================================
> +  EINVAL     The specified `gpa` and `size` were invalid (e.g. not
> +             page aligned).
> +  ENOENT     The specified `gpa` is outside defined memslots.
> +  EINTR      An unmasked signal is pending and no page was processed.
> +  EFAULT     The parameter address was invalid.
> +  EOPNOTSUPP Mapping memory for a GPA is unsupported by the
> +             hypervisor, and/or for the current vCPU state/mode.

     EIO        Unexpected error happened.

> +  ========== ===============================================================
> +
> +::
> +
> +  struct kvm_pre_fault_memory {
> +	/* in/out */
> +	__u64 gpa;
> +	__u64 size;
> +	/* in */
> +	__u64 flags;
> +	__u64 padding[5];
> +  };
> +
> +KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
> +for the current vCPU state.  KVM maps memory as if the vCPU generated a
> +stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
> +CoW.  However, KVM does not mark any newly created stage-2 PTE as Accessed.
> +
> +In some cases, multiple vCPUs might share the page tables.  In this
> +case, the ioctl can be called in parallel.
> +
> +Shadow page tables cannot support this ioctl because they
> +are indexed by virtual address or nested guest physical address.
> +Calling this ioctl when the guest is using shadow page tables (for
> +example because it is running a nested guest with nested page tables)
> +will fail with `EOPNOTSUPP` even if `KVM_CHECK_EXTENSION` reports
> +the capability to be present.
> +
> +`flags` must currently be zero.

`flags` and `padding`

> +
> +
>  5. The kvm_run structure
>  ========================
>  
> -- 
> 2.43.0
> 
> 
>
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0b76ff5030d..bbcaa5d2b54b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6352,6 +6352,56 @@  a single guest_memfd file, but the bound ranges must not overlap).
 
 See KVM_SET_USER_MEMORY_REGION2 for additional details.
 
+4.143 KVM_PRE_FAULT_MEMORY
+------------------------
+
+:Capability: KVM_CAP_PRE_FAULT_MEMORY
+:Architectures: none
+:Type: vcpu ioctl
+:Parameters: struct kvm_pre_fault_memory (in/out)
+:Returns: 0 on success, < 0 on error
+
+Errors:
+
+  ========== ===============================================================
+  EINVAL     The specified `gpa` and `size` were invalid (e.g. not
+             page aligned).
+  ENOENT     The specified `gpa` is outside defined memslots.
+  EINTR      An unmasked signal is pending and no page was processed.
+  EFAULT     The parameter address was invalid.
+  EOPNOTSUPP Mapping memory for a GPA is unsupported by the
+             hypervisor, and/or for the current vCPU state/mode.
+  ========== ===============================================================
+
+::
+
+  struct kvm_pre_fault_memory {
+	/* in/out */
+	__u64 gpa;
+	__u64 size;
+	/* in */
+	__u64 flags;
+	__u64 padding[5];
+  };
+
+KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
+for the current vCPU state.  KVM maps memory as if the vCPU generated a
+stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
+CoW.  However, KVM does not mark any newly created stage-2 PTE as Accessed.
+
+In some cases, multiple vCPUs might share the page tables.  In this
+case, the ioctl can be called in parallel.
+
+Shadow page tables cannot support this ioctl because they
+are indexed by virtual address or nested guest physical address.
+Calling this ioctl when the guest is using shadow page tables (for
+example because it is running a nested guest with nested page tables)
+will fail with `EOPNOTSUPP` even if `KVM_CHECK_EXTENSION` reports
+the capability to be present.
+
+`flags` must currently be zero.
+
+
 5. The kvm_run structure
 ========================