Message ID | 20240419085927.3648704-2-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Guest Memory Pre-Population API | expand |
On Fri, Apr 19, 2024 at 04:59:22AM -0400, Paolo Bonzini <pbonzini@redhat.com> wrote: > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Adds documentation of KVM_PRE_FAULT_MEMORY ioctl. [1] > > It populates guest memory. It doesn't do extra operations on the > underlying technology-specific initialization [2]. For example, > CoCo-related operations won't be performed. Concretely for TDX, this API > won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific APIs > are required for such operations. > > The key point is to adapt of vcpu ioctl instead of VM ioctl. First, > populating guest memory requires vcpu. If it is VM ioctl, we need to pick > one vcpu somehow. Secondly, vcpu ioctl allows each vcpu to invoke this > ioctl in parallel. It helps to scale regarding guest memory size, e.g., > hundreds of GB. > > [1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/ > [2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/ > > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > Message-ID: <9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata@intel.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > Documentation/virt/kvm/api.rst | 50 ++++++++++++++++++++++++++++++++++ > 1 file changed, 50 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index f0b76ff5030d..bbcaa5d2b54b 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6352,6 +6352,56 @@ a single guest_memfd file, but the bound ranges must not overlap). > > See KVM_SET_USER_MEMORY_REGION2 for additional details. > > +4.143 KVM_PRE_FAULT_MEMORY > +------------------------ > + > +:Capability: KVM_CAP_PRE_FAULT_MEMORY > +:Architectures: none > +:Type: vcpu ioctl > +:Parameters: struct kvm_pre_fault_memory (in/out) > +:Returns: 0 on success, < 0 on error > + > +Errors: > + > + ========== =============================================================== > + EINVAL The specified `gpa` and `size` were invalid (e.g. not > + page aligned). > + ENOENT The specified `gpa` is outside defined memslots. > + EINTR An unmasked signal is pending and no page was processed. > + EFAULT The parameter address was invalid. > + EOPNOTSUPP Mapping memory for a GPA is unsupported by the > + hypervisor, and/or for the current vCPU state/mode. EIO Unexpected error happened. > + ========== =============================================================== > + > +:: > + > + struct kvm_pre_fault_memory { > + /* in/out */ > + __u64 gpa; > + __u64 size; > + /* in */ > + __u64 flags; > + __u64 padding[5]; > + }; > + > +KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory > +for the current vCPU state. KVM maps memory as if the vCPU generated a > +stage-2 read page fault, e.g. faults in memory as needed, but doesn't break > +CoW. However, KVM does not mark any newly created stage-2 PTE as Accessed. > + > +In some cases, multiple vCPUs might share the page tables. In this > +case, the ioctl can be called in parallel. > + > +Shadow page tables cannot support this ioctl because they > +are indexed by virtual address or nested guest physical address. > +Calling this ioctl when the guest is using shadow page tables (for > +example because it is running a nested guest with nested page tables) > +will fail with `EOPNOTSUPP` even if `KVM_CHECK_EXTENSION` reports > +the capability to be present. > + > +`flags` must currently be zero. `flags` and `padding` > + > + > 5. The kvm_run structure > ======================== > > -- > 2.43.0 > > >
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index f0b76ff5030d..bbcaa5d2b54b 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6352,6 +6352,56 @@ a single guest_memfd file, but the bound ranges must not overlap). See KVM_SET_USER_MEMORY_REGION2 for additional details. +4.143 KVM_PRE_FAULT_MEMORY +------------------------ + +:Capability: KVM_CAP_PRE_FAULT_MEMORY +:Architectures: none +:Type: vcpu ioctl +:Parameters: struct kvm_pre_fault_memory (in/out) +:Returns: 0 on success, < 0 on error + +Errors: + + ========== =============================================================== + EINVAL The specified `gpa` and `size` were invalid (e.g. not + page aligned). + ENOENT The specified `gpa` is outside defined memslots. + EINTR An unmasked signal is pending and no page was processed. + EFAULT The parameter address was invalid. + EOPNOTSUPP Mapping memory for a GPA is unsupported by the + hypervisor, and/or for the current vCPU state/mode. + ========== =============================================================== + +:: + + struct kvm_pre_fault_memory { + /* in/out */ + __u64 gpa; + __u64 size; + /* in */ + __u64 flags; + __u64 padding[5]; + }; + +KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory +for the current vCPU state. KVM maps memory as if the vCPU generated a +stage-2 read page fault, e.g. faults in memory as needed, but doesn't break +CoW. However, KVM does not mark any newly created stage-2 PTE as Accessed. + +In some cases, multiple vCPUs might share the page tables. In this +case, the ioctl can be called in parallel. + +Shadow page tables cannot support this ioctl because they +are indexed by virtual address or nested guest physical address. +Calling this ioctl when the guest is using shadow page tables (for +example because it is running a nested guest with nested page tables) +will fail with `EOPNOTSUPP` even if `KVM_CHECK_EXTENSION` reports +the capability to be present. + +`flags` must currently be zero. + + 5. The kvm_run structure ========================