Message ID | 20240710174031.312055-7-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Guest Memory Pre-Population API | expand |
On 7/11/2024 1:40 AM, Paolo Bonzini wrote: > Wire KVM_PRE_FAULT_MEMORY ioctl to __kvm_mmu_do_page_fault() to populate guest __kvm_mmu_do_page_fault() -> kvm_mmu_do_page_fault() > memory. It can be called right after KVM_CREATE_VCPU creates a vCPU, > since at that point kvm_mmu_create() and kvm_init_mmu() are called and > the vCPU is ready to invoke the KVM page fault handler. > > The helper function kvm_mmu_map_tdp_page take care of the logic to kvm_mmu_map_tdp_page -> kvm_tdp_map_page()? > process RET_PF_* return values and convert them to success or errno. > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > Message-ID: <9b866a0ae7147f96571c439e75429a03dcb659b6.1712785629.git.isaku.yamahata@intel.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/Kconfig | 1 + > arch/x86/kvm/mmu/mmu.c | 73 ++++++++++++++++++++++++++++++++++++++++++ > arch/x86/kvm/x86.c | 3 ++ > 3 files changed, 77 insertions(+) > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index 80e5afde69f4..4287a8071a3a 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -44,6 +44,7 @@ config KVM > select KVM_VFIO > select HAVE_KVM_PM_NOTIFIER if PM > select KVM_GENERIC_HARDWARE_ENABLING > + select KVM_GENERIC_PRE_FAULT_MEMORY > select KVM_WERROR if WERROR > help > Support hosting fully virtualized guest machines using hardware [...] > index ba0ad76f53bc..a6968eadd418 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4705,6 +4705,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_MEMORY_FAULT_INFO: > r = 1; > break; > + case KVM_CAP_PRE_FAULT_MEMORY: > + r = tdp_enabled; > + break; If !CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY, this should return 0. > case KVM_CAP_EXIT_HYPERCALL: > r = KVM_EXIT_HYPERCALL_VALID_MASK; > break;
On Thu, Jul 11, 2024 at 7:37 AM Binbin Wu <binbin.wu@linux.intel.com> wrote: > On 7/11/2024 1:40 AM, Paolo Bonzini wrote: > > Wire KVM_PRE_FAULT_MEMORY ioctl to __kvm_mmu_do_page_fault() to populate guest > > __kvm_mmu_do_page_fault() -> kvm_mmu_do_page_fault() > > > memory. It can be called right after KVM_CREATE_VCPU creates a vCPU, > > since at that point kvm_mmu_create() and kvm_init_mmu() are called and > > the vCPU is ready to invoke the KVM page fault handler. > > > > The helper function kvm_mmu_map_tdp_page take care of the logic to > > kvm_mmu_map_tdp_page -> kvm_tdp_map_page()? Yes, will fix. > > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > > index 80e5afde69f4..4287a8071a3a 100644 > > --- a/arch/x86/kvm/Kconfig > > +++ b/arch/x86/kvm/Kconfig > > @@ -44,6 +44,7 @@ config KVM > > select KVM_VFIO > > select HAVE_KVM_PM_NOTIFIER if PM > > select KVM_GENERIC_HARDWARE_ENABLING > > + select KVM_GENERIC_PRE_FAULT_MEMORY > > select KVM_WERROR if WERROR > > help > > Support hosting fully virtualized guest machines using hardware > [...] > > index ba0ad76f53bc..a6968eadd418 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -4705,6 +4705,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > > case KVM_CAP_MEMORY_FAULT_INFO: > > r = 1; > > break; > > + case KVM_CAP_PRE_FAULT_MEMORY: > > + r = tdp_enabled; > > + break; > If !CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY, this should return 0. This is x86-specific code and it CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY is always selected by CONFIG_KVM on x86 (that is, it does not depend on TDX or anything else). Paolo
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 80e5afde69f4..4287a8071a3a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -44,6 +44,7 @@ config KVM select KVM_VFIO select HAVE_KVM_PM_NOTIFIER if PM select KVM_GENERIC_HARDWARE_ENABLING + select KVM_GENERIC_PRE_FAULT_MEMORY select KVM_WERROR if WERROR help Support hosting fully virtualized guest machines using hardware diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 152b30fa22ad..4e0e9963066f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4709,6 +4709,79 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return direct_page_fault(vcpu, fault); } +static int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, + u8 *level) +{ + int r; + + /* + * Restrict to TDP page fault, since that's the only case where the MMU + * is indexed by GPA. + */ + if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault) + return -EOPNOTSUPP; + + do { + if (signal_pending(current)) + return -EINTR; + cond_resched(); + r = kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level); + } while (r == RET_PF_RETRY); + + if (r < 0) + return r; + + switch (r) { + case RET_PF_FIXED: + case RET_PF_SPURIOUS: + return 0; + + case RET_PF_EMULATE: + return -ENOENT; + + case RET_PF_RETRY: + case RET_PF_CONTINUE: + case RET_PF_INVALID: + default: + WARN_ONCE(1, "could not fix page fault during prefault"); + return -EIO; + } +} + +long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, + struct kvm_pre_fault_memory *range) +{ + u64 error_code = PFERR_GUEST_FINAL_MASK; + u8 level = PG_LEVEL_4K; + u64 end; + int r; + + /* + * reload is efficient when called repeatedly, so we can do it on + * every iteration. + */ + kvm_mmu_reload(vcpu); + + if (kvm_arch_has_private_mem(vcpu->kvm) && + kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(range->gpa))) + error_code |= PFERR_PRIVATE_ACCESS; + + /* + * Shadow paging uses GVA for kvm page fault, so restrict to + * two-dimensional paging. + */ + r = kvm_tdp_map_page(vcpu, range->gpa, error_code, &level); + if (r < 0) + return r; + + /* + * If the mapping that covers range->gpa can use a huge page, it + * may start below it or end after range->gpa + range->size. + */ + end = (range->gpa & KVM_HPAGE_MASK(level)) + KVM_HPAGE_SIZE(level); + return min(range->size, end - range->gpa); +} + static void nonpaging_init_context(struct kvm_mmu *context) { context->page_fault = nonpaging_page_fault; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ba0ad76f53bc..a6968eadd418 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4705,6 +4705,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_MEMORY_FAULT_INFO: r = 1; break; + case KVM_CAP_PRE_FAULT_MEMORY: + r = tdp_enabled; + break; case KVM_CAP_EXIT_HYPERCALL: r = KVM_EXIT_HYPERCALL_VALID_MASK; break;