Message ID | 20240417153450.3608097-6-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Guest Memory Pre-Population API | expand |
On Wed, Apr 17, 2024, Paolo Bonzini wrote: > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Introduce a helper function to call the KVM fault handler. It allows a new > ioctl to invoke the KVM fault handler to populate without seeing RET_PF_* > enums or other KVM MMU internal definitions because RET_PF_* are internal > to x86 KVM MMU. The implementation is restricted to two-dimensional paging > for simplicity. The shadow paging uses GVA for faulting instead of L1 GPA. > It makes the API difficult to use. > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > Message-ID: <9b866a0ae7147f96571c439e75429a03dcb659b6.1712785629.git.isaku.yamahata@intel.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/mmu.h | 3 +++ > arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++ > 2 files changed, 35 insertions(+) > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index e8b620a85627..51ff4f67e115 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, > __kvm_mmu_refresh_passthrough_bits(vcpu, mmu); > } > > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, > + u8 *level); > + > /* > * Check if a given access (described through the I/D, W/R and U/S bits of a > * page fault error code pfec) causes a permission fault with the given PTE > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 7fbcfc97edcc..fb2149d16f8d 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4646,6 +4646,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > return direct_page_fault(vcpu, fault); > } > > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, > + u8 *level) If the return is an overloaded "long", then there's no need for @level, i.e. do the level=>size conversion in this helper. > +{ > + int r; > + > + /* Restrict to TDP page fault. */ Do we want to restrict this to the TDP MMU? Not for any particular reason, mostly just to keep moving towards officially deprecating/removing TDP support from the shadow MMU. > + if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault) > + return -EOPNOTSUPP; > + > + r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level); > + if (r < 0) > + return r; > + > + switch (r) { > + case RET_PF_RETRY: > + return -EAGAIN; > + > + case RET_PF_FIXED: > + case RET_PF_SPURIOUS: > + return 0; Going with the "long" idea, this becomes: end = (gpa & KVM_HPAGE_MASK(level)) + KVM_HPAGE_SIZE(level); return min(size, end - gpa); though I would vote for a: break; so that the happy path is nicely isolated at the end of the function. > + > + case RET_PF_EMULATE: > + return -EINVAL; > + > + case RET_PF_CONTINUE: > + case RET_PF_INVALID: > + default: > + WARN_ON_ONCE(r); > + return -EIO; > + } > +} > + > static void nonpaging_init_context(struct kvm_mmu *context) > { > context->page_fault = nonpaging_page_fault; > -- > 2.43.0 > >
On Wed, Apr 17, 2024 at 11:24 PM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Apr 17, 2024, Paolo Bonzini wrote: > > From: Isaku Yamahata <isaku.yamahata@intel.com> > > > > Introduce a helper function to call the KVM fault handler. It allows a new > > ioctl to invoke the KVM fault handler to populate without seeing RET_PF_* > > enums or other KVM MMU internal definitions because RET_PF_* are internal > > to x86 KVM MMU. The implementation is restricted to two-dimensional paging > > for simplicity. The shadow paging uses GVA for faulting instead of L1 GPA. > > It makes the API difficult to use. > > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > > Message-ID: <9b866a0ae7147f96571c439e75429a03dcb659b6.1712785629.git.isaku.yamahata@intel.com> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > --- > > arch/x86/kvm/mmu.h | 3 +++ > > arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++ > > 2 files changed, 35 insertions(+) > > > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > > index e8b620a85627..51ff4f67e115 100644 > > --- a/arch/x86/kvm/mmu.h > > +++ b/arch/x86/kvm/mmu.h > > @@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, > > __kvm_mmu_refresh_passthrough_bits(vcpu, mmu); > > } > > > > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, > > + u8 *level); > > + > > /* > > * Check if a given access (described through the I/D, W/R and U/S bits of a > > * page fault error code pfec) causes a permission fault with the given PTE > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 7fbcfc97edcc..fb2149d16f8d 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -4646,6 +4646,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > return direct_page_fault(vcpu, fault); > > } > > > > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, > > + u8 *level) > > If the return is an overloaded "long", then there's no need for @level, i.e. do > the level=>size conversion in this helper. > > > +{ > > + int r; > > + > > + /* Restrict to TDP page fault. */ > > Do we want to restrict this to the TDP MMU? Not for any particular reason, mostly > just to keep moving towards officially deprecating/removing TDP support from the > shadow MMU. Heh, yet another thing I briefly thought about while reviewing Isaku's work. In the end I decided that, with the implementation being just a regular prefault, there's not much to save from keeping the shadow MMU away from this. The real ugly part is that if the memslots are zapped the pre-population effect basically goes away (damn kvm_arch_flush_shadow_memslot). This is the reason why I initially thought of KVM_CHECK_EXTENSION for the VM file descriptor, to only allow this for TDX VMs. The real solution for this is to not "graduate" this ioctl too soon to kvm/next. Let's keep it in kvm-coco-queue until TDX is ready and then make a final decision. Paolo > > + if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault) > > + return -EOPNOTSUPP; > > + > > + r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level); > > + if (r < 0) > > + return r; > > + > > + switch (r) { > > + case RET_PF_RETRY: > > + return -EAGAIN; > > + > > + case RET_PF_FIXED: > > + case RET_PF_SPURIOUS: > > + return 0; > > Going with the "long" idea, this becomes: > > end = (gpa & KVM_HPAGE_MASK(level)) + KVM_HPAGE_SIZE(level); > return min(size, end - gpa); > > though I would vote for a: > > break; > > so that the happy path is nicely isolated at the end of the function. > > > + > > + case RET_PF_EMULATE: > > + return -EINVAL; > > + > > + case RET_PF_CONTINUE: > > + case RET_PF_INVALID: > > + default: > > + WARN_ON_ONCE(r); > > + return -EIO; > > + } > > +} > > + > > static void nonpaging_init_context(struct kvm_mmu *context) > > { > > context->page_fault = nonpaging_page_fault; > > -- > > 2.43.0 > > > > >
On Wed, Apr 17, 2024, Sean Christopherson wrote: > On Wed, Apr 17, 2024, Paolo Bonzini wrote: > > + case RET_PF_EMULATE: > > + return -EINVAL; Almost forgot. EINVAL on emulation is weird. I don't know that any return code is going to be "good", but I think just about anything is better than EINVAL, e.g. arguably this could be -EBUSY since retrying after creating a memslot would succeed.
On Wed, Apr 17, 2024 at 11:34 PM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Apr 17, 2024, Sean Christopherson wrote: > > On Wed, Apr 17, 2024, Paolo Bonzini wrote: > > > + case RET_PF_EMULATE: > > > + return -EINVAL; > > Almost forgot. EINVAL on emulation is weird. I don't know that any return code > is going to be "good", but I think just about anything is better than EINVAL, > e.g. arguably this could be -EBUSY since retrying after creating a memslot would > succeed. Then I guess -ENOENT? Paolo
On Wed, Apr 17, 2024, Paolo Bonzini wrote: > On Wed, Apr 17, 2024 at 11:24 PM Sean Christopherson <seanjc@google.com> wrote: > > Do we want to restrict this to the TDP MMU? Not for any particular reason, > > mostly just to keep moving towards officially deprecating/removing TDP > > support from the shadow MMU. > > Heh, yet another thing I briefly thought about while reviewing Isaku's > work. In the end I decided that, with the implementation being just a > regular prefault, there's not much to save from keeping the shadow MMU > away from this. Yeah. > The real ugly part is that if the memslots are zapped the > pre-population effect basically goes away (damn > kvm_arch_flush_shadow_memslot). Ah, the eternal thorn in my side. > This is the reason why I initially thought of KVM_CHECK_EXTENSION for the VM > file descriptor, to only allow this for TDX VMs. I'm fairly certain memslot deletion is mostly a QEMU specific problem. Allegedly (I haven't verified), our userspace+firmware doesn't delete any memslots during boot. And it might even be solvable for QEMU, at least for some configurations. E.g. during boot, my QEMU+OVMF setup creates and deletes the SMRAM memslot (despite my KVM build not supporting SMM), and deletes the lower RAM memslot when relocating BIOS. The SMRAM is definitely solvable, and the BIOS relocation stuff seems like it's solvable too.
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index e8b620a85627..51ff4f67e115 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu, __kvm_mmu_refresh_passthrough_bits(vcpu, mmu); } +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, + u8 *level); + /* * Check if a given access (described through the I/D, W/R and U/S bits of a * page fault error code pfec) causes a permission fault with the given PTE diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7fbcfc97edcc..fb2149d16f8d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4646,6 +4646,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return direct_page_fault(vcpu, fault); } +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, + u8 *level) +{ + int r; + + /* Restrict to TDP page fault. */ + if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault) + return -EOPNOTSUPP; + + r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level); + if (r < 0) + return r; + + switch (r) { + case RET_PF_RETRY: + return -EAGAIN; + + case RET_PF_FIXED: + case RET_PF_SPURIOUS: + return 0; + + case RET_PF_EMULATE: + return -EINVAL; + + case RET_PF_CONTINUE: + case RET_PF_INVALID: + default: + WARN_ON_ONCE(r); + return -EIO; + } +} + static void nonpaging_init_context(struct kvm_mmu *context) { context->page_fault = nonpaging_page_fault;