Message ID | 20230311002258.852397-15-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups | expand |
On Fri, Mar 10, 2023 at 04:22:45PM -0800, Sean Christopherson wrote: > Disallow moving memslots if the VM has external page-track users, i.e. if > KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't > correctly handle moving memory regions. > > Note, this is potential ABI breakage! E.g. userspace could move regions > that aren't shadowed by KVMGT without harming the guest. However, the > only known user of KVMGT is QEMU, and QEMU doesn't move generic memory > regions. KVM's own support for moving memory regions was also broken for > multiple years (albeit for an edge case, but arguably moving RAM is > itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate > new rmap and large page tracking when moving memslot"). > > Signed-off-by: Sean Christopherson <seanjc@google.com> ... > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 29dd6c97d145..47ac9291cd43 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > struct kvm_memory_slot *new, > enum kvm_mr_change change) > { > + /* > + * KVM doesn't support moving memslots when there are external page > + * trackers attached to the VM, i.e. if KVMGT is in use. > + */ > + if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm)) > + return -EINVAL; Hmm, will page track work correctly on moving memslots when there's no external users? in case of KVM_MR_MOVE, kvm_prepare_memory_region(kvm, old, new, change) |->kvm_arch_prepare_memory_region(kvm, old, new, change) |->kvm_alloc_memslot_metadata(kvm, new) |->memset(&slot->arch, 0, sizeof(slot->arch)); |->kvm_page_track_create_memslot(kvm, slot, npages) The new->arch.arch.gfn_write_track will be fresh empty. kvm_arch_commit_memory_region(kvm, old, new, change); |->kvm_arch_free_memslot(kvm, old); |->kvm_page_track_free_memslot(slot); The old->arch.gfn_write_track is freed afterwards. So, in theory, the new GFNs are not write tracked though the old ones are. Is that acceptable for the internal page-track user? > if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) { > if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn()) > return -EINVAL; > -- > 2.40.0.rc1.284.g88254d51c5-goog >
On Wed, Mar 15, 2023, Yan Zhao wrote: > On Fri, Mar 10, 2023 at 04:22:45PM -0800, Sean Christopherson wrote: > > Disallow moving memslots if the VM has external page-track users, i.e. if > > KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't > > correctly handle moving memory regions. > > > > Note, this is potential ABI breakage! E.g. userspace could move regions > > that aren't shadowed by KVMGT without harming the guest. However, the > > only known user of KVMGT is QEMU, and QEMU doesn't move generic memory > > regions. KVM's own support for moving memory regions was also broken for > > multiple years (albeit for an edge case, but arguably moving RAM is > > itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate > > new rmap and large page tracking when moving memslot"). > > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > ... > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 29dd6c97d145..47ac9291cd43 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > > struct kvm_memory_slot *new, > > enum kvm_mr_change change) > > { > > + /* > > + * KVM doesn't support moving memslots when there are external page > > + * trackers attached to the VM, i.e. if KVMGT is in use. > > + */ > > + if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm)) > > + return -EINVAL; > Hmm, will page track work correctly on moving memslots when there's no > external users? > > in case of KVM_MR_MOVE, > kvm_prepare_memory_region(kvm, old, new, change) > |->kvm_arch_prepare_memory_region(kvm, old, new, change) > |->kvm_alloc_memslot_metadata(kvm, new) > |->memset(&slot->arch, 0, sizeof(slot->arch)); > |->kvm_page_track_create_memslot(kvm, slot, npages) > The new->arch.arch.gfn_write_track will be fresh empty. > > > kvm_arch_commit_memory_region(kvm, old, new, change); > |->kvm_arch_free_memslot(kvm, old); > |->kvm_page_track_free_memslot(slot); > The old->arch.gfn_write_track is freed afterwards. > > So, in theory, the new GFNs are not write tracked though the old ones are. > > Is that acceptable for the internal page-track user? It works because KVM zaps all SPTEs when a memslot is moved, i.e. the fact that KVM loses the write-tracking counts is benign. I suspect no VMM actually does does KVM_MR_MOVE in conjunction with shadow paging, but the ongoing maintenance cost of supporting KVM_MR_MOVE is quite low at this point, so trying to rip it out isn't worth the pain of having to deal with potential ABI breakage. Though in hindsight I wish I had tried disallowed moving memslots instead of fixing the various bugs a few years back. :-(
On Wed, Mar 15, 2023 at 08:43:54AM -0700, Sean Christopherson wrote: > > So, in theory, the new GFNs are not write tracked though the old ones are. > > > > Is that acceptable for the internal page-track user? > > It works because KVM zaps all SPTEs when a memslot is moved, i.e. the fact that Oh, yes! And KVM will not shadow SPTEs for a invalid memslot, so there's no problem. Thanks~ > KVM loses the write-tracking counts is benign. I suspect no VMM actually does > does KVM_MR_MOVE in conjunction with shadow paging, but the ongoing maintenance > cost of supporting KVM_MR_MOVE is quite low at this point, so trying to rip it > out isn't worth the pain of having to deal with potential ABI breakage. > > Though in hindsight I wish I had tried disallowed moving memslots instead of > fixing the various bugs a few years back. :-(
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> On Fri, Mar 10, 2023 at 04:22:45PM -0800, Sean Christopherson wrote: > Disallow moving memslots if the VM has external page-track users, i.e. if > KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't > correctly handle moving memory regions. > > Note, this is potential ABI breakage! E.g. userspace could move regions > that aren't shadowed by KVMGT without harming the guest. However, the > only known user of KVMGT is QEMU, and QEMU doesn't move generic memory > regions. KVM's own support for moving memory regions was also broken for > multiple years (albeit for an edge case, but arguably moving RAM is > itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate > new rmap and large page tracking when moving memslot"). > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/include/asm/kvm_page_track.h | 3 +++ > arch/x86/kvm/mmu/page_track.c | 5 +++++ > arch/x86/kvm/x86.c | 7 +++++++ > 3 files changed, 15 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h > index 0d65ae203fd6..6a287bcbe8a9 100644 > --- a/arch/x86/include/asm/kvm_page_track.h > +++ b/arch/x86/include/asm/kvm_page_track.h > @@ -77,4 +77,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm, > void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, > int bytes); > void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot); > + > +bool kvm_page_track_has_external_user(struct kvm *kvm); > + > #endif > diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c > index 39a0863af8b4..1cfc0a0ccc23 100644 > --- a/arch/x86/kvm/mmu/page_track.c > +++ b/arch/x86/kvm/mmu/page_track.c > @@ -321,3 +321,8 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn, > return max_level; > } > EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level); > + > +bool kvm_page_track_has_external_user(struct kvm *kvm) > +{ > + return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list); > +} > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 29dd6c97d145..47ac9291cd43 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > struct kvm_memory_slot *new, > enum kvm_mr_change change) > { > + /* > + * KVM doesn't support moving memslots when there are external page > + * trackers attached to the VM, i.e. if KVMGT is in use. > + */ > + if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm)) > + return -EINVAL; > + > if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) { > if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn()) > return -EINVAL; > -- > 2.40.0.rc1.284.g88254d51c5-goog >
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h index 0d65ae203fd6..6a287bcbe8a9 100644 --- a/arch/x86/include/asm/kvm_page_track.h +++ b/arch/x86/include/asm/kvm_page_track.h @@ -77,4 +77,7 @@ kvm_page_track_unregister_notifier(struct kvm *kvm, void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, int bytes); void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot); + +bool kvm_page_track_has_external_user(struct kvm *kvm); + #endif diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 39a0863af8b4..1cfc0a0ccc23 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -321,3 +321,8 @@ enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn, return max_level; } EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level); + +bool kvm_page_track_has_external_user(struct kvm *kvm) +{ + return hlist_empty(&kvm->arch.track_notifier_head.track_notifier_list); +} diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 29dd6c97d145..47ac9291cd43 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12484,6 +12484,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *new, enum kvm_mr_change change) { + /* + * KVM doesn't support moving memslots when there are external page + * trackers attached to the VM, i.e. if KVMGT is in use. + */ + if (change == KVM_MR_MOVE && kvm_page_track_has_external_user(kvm)) + return -EINVAL; + if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) { if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn()) return -EINVAL;
Disallow moving memslots if the VM has external page-track users, i.e. if KVMGT is being used to expose a virtual GPU to the guest, as KVM doesn't correctly handle moving memory regions. Note, this is potential ABI breakage! E.g. userspace could move regions that aren't shadowed by KVMGT without harming the guest. However, the only known user of KVMGT is QEMU, and QEMU doesn't move generic memory regions. KVM's own support for moving memory regions was also broken for multiple years (albeit for an edge case, but arguably moving RAM is itself an edge case), e.g. see commit edd4fa37baa6 ("KVM: x86: Allocate new rmap and large page tracking when moving memslot"). Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/include/asm/kvm_page_track.h | 3 +++ arch/x86/kvm/mmu/page_track.c | 5 +++++ arch/x86/kvm/x86.c | 7 +++++++ 3 files changed, 15 insertions(+)