Message ID | 20220310140911.50924-10-chao.p.peng@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: mm: fd-based approach for supporting KVM guest private memory | expand |
On Thu, Mar 10, 2022, Chao Peng wrote: > @@ -3890,7 +3893,59 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); > } > > -static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r) > +static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) > +{ > + /* > + * At this time private gfn has not been supported yet. Other patch > + * that enables it should change this. > + */ > + return false; > +} > + > +static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, > + struct kvm_page_fault *fault, > + bool *is_private_pfn, int *r) @is_private_pfn should be a field in @fault, not a separate parameter, and it should be a const property set by the original caller. I would also name it "is_private", because if KVM proceeds past this point, it will be a property of the fault/access _and_ the pfn I say it's a property of the fault because the below kvm_vcpu_is_private_gfn() should instead be: if (fault->is_private) The kvm_vcpu_is_private_gfn() check is TDX centric. For SNP, private vs. shared is communicated via error code. For software-only (I'm being optimistic ;-) ), we'd probably need to track private vs. shared internally in KVM, I don't think we'd want to force it to be a property of the gfn. Then you can also move the fault->is_private waiver into is_page_fault_stale(), and drop the local is_private_pfn in direct_page_fault(). > +{ > + int order; > + unsigned int flags = 0; > + struct kvm_memory_slot *slot = fault->slot; > + long pfn = kvm_memfile_get_pfn(slot, fault->gfn, &order); If get_lock_pfn() and thus kvm_memfile_get_pfn() returns a pure error code instead of multiplexing the pfn, then this can be: bool is_private_pfn; is_private_pfn = !!kvm_memfile_get_pfn(slot, fault->gfn, &fault->pfn, &order); That self-documents the "pfn < 0" == shared logic. > + > + if (kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT)) { > + if (pfn < 0) > + flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; > + else { > + fault->pfn = pfn; > + if (slot->flags & KVM_MEM_READONLY) > + fault->map_writable = false; > + else > + fault->map_writable = true; > + > + if (order == 0) > + fault->max_level = PG_LEVEL_4K; This doesn't correctly handle order > 0, but less than the next page size, in which case max_level needs to be PG_LEVEL_4k. It also doesn't handle the case where max_level > PG_LEVEL_2M. That said, I think the proper fix is to have the get_lock_pfn() API return the max mapping level, not the order. KVM, and presumably any other secondary MMU that might use these APIs, doesn't care about the order of the struct page, KVM cares about the max size/level of page it can map into the guest. And similar to the previous patch, "order" is specific to struct page, which we are trying to avoid. > + *is_private_pfn = true; This is where KVM guarantees that is_private_pfn == fault->is_private. > + *r = RET_PF_FIXED; > + return true; Ewww. This is super confusing. Ditto for the "*r = -1" magic number. I totally understand why you took this approach, it's just hard to follow because it kinda follows the kvm_faultin_pfn() semantics, but then inverts true and false in this one case. I think the least awful option is to forego the helper and open code everything. If we ever refactor kvm_faultin_pfn() to be less weird then we can maybe move this to a helper. Open coding isn't too bad if you reorganize things so that the exit-to-userspace path is a dedicated, early check. IMO, it's a lot easier to read this way, open coded or not. I think this is correct? "is_private_pfn" and "level" are locals, everything else is in @fault. if (kvm_slot_is_private(slot)) { is_private_pfn = !!kvm_memfile_get_pfn(slot, fault->gfn, &fault->pfn, &level); if (fault->is_private != is_private_pfn) { if (is_private_pfn) kvm_memfile_put_pfn(slot, fault->pfn); vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR; if (fault->is_private) vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE; else vcpu->run->memory.flags = 0; vcpu->run->memory.padding = 0; vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT; vcpu->run->memory.size = PAGE_SIZE; *r = 0; return true; } /* * fault->pfn is all set if the fault is for a private pfn, just * need to update other metadata. */ if (fault->is_private) { fault->max_level = min(fault->max_level, level); fault->map_writable = !(slot->flags & KVM_MEM_READONLY); return false; } /* Fault is shared, fallthrough to the standard path. */ } async = false; > @@ -4016,7 +4076,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > else > write_lock(&vcpu->kvm->mmu_lock); > > - if (is_page_fault_stale(vcpu, fault, mmu_seq)) > + if (!is_private_pfn && is_page_fault_stale(vcpu, fault, mmu_seq)) As above, I'd prefer this check go in is_page_fault_stale(). It means shadow MMUs will suffer a pointless check, but I don't think that's a big issue. Oooh, unless we support software-only, which would play nice with nested and probably even legacy shadow paging. Fun :-) > goto out_unlock; > > r = make_mmu_pages_available(vcpu); > @@ -4033,7 +4093,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > read_unlock(&vcpu->kvm->mmu_lock); > else > write_unlock(&vcpu->kvm->mmu_lock); > - kvm_release_pfn_clean(fault->pfn); > + > + if (is_private_pfn) And this can be if (fault->is_private) Same feedback for paging_tmpl.h.
On Tue, Mar 29, 2022 at 01:07:18AM +0000, Sean Christopherson wrote: > On Thu, Mar 10, 2022, Chao Peng wrote: > > @@ -3890,7 +3893,59 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > > kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); > > } > > > > -static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r) > > +static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) > > +{ > > + /* > > + * At this time private gfn has not been supported yet. Other patch > > + * that enables it should change this. > > + */ > > + return false; > > +} > > + > > +static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, > > + struct kvm_page_fault *fault, > > + bool *is_private_pfn, int *r) > > @is_private_pfn should be a field in @fault, not a separate parameter, and it > should be a const property set by the original caller. I would also name it > "is_private", because if KVM proceeds past this point, it will be a property of > the fault/access _and_ the pfn > > I say it's a property of the fault because the below kvm_vcpu_is_private_gfn() > should instead be: > > if (fault->is_private) > > The kvm_vcpu_is_private_gfn() check is TDX centric. For SNP, private vs. shared > is communicated via error code. For software-only (I'm being optimistic ;-) ), > we'd probably need to track private vs. shared internally in KVM, I don't think > we'd want to force it to be a property of the gfn. Make sense. > > Then you can also move the fault->is_private waiver into is_page_fault_stale(), > and drop the local is_private_pfn in direct_page_fault(). > > > +{ > > + int order; > > + unsigned int flags = 0; > > + struct kvm_memory_slot *slot = fault->slot; > > + long pfn = kvm_memfile_get_pfn(slot, fault->gfn, &order); > > If get_lock_pfn() and thus kvm_memfile_get_pfn() returns a pure error code instead > of multiplexing the pfn, then this can be: > > bool is_private_pfn; > > is_private_pfn = !!kvm_memfile_get_pfn(slot, fault->gfn, &fault->pfn, &order); > > That self-documents the "pfn < 0" == shared logic. Yes, agreed. > > > + > > + if (kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT)) { > > + if (pfn < 0) > > + flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; > > + else { > > + fault->pfn = pfn; > > + if (slot->flags & KVM_MEM_READONLY) > > + fault->map_writable = false; > > + else > > + fault->map_writable = true; > > + > > + if (order == 0) > > + fault->max_level = PG_LEVEL_4K; > > This doesn't correctly handle order > 0, but less than the next page size, in which > case max_level needs to be PG_LEVEL_4k. It also doesn't handle the case where > max_level > PG_LEVEL_2M. > > That said, I think the proper fix is to have the get_lock_pfn() API return the max > mapping level, not the order. KVM, and presumably any other secondary MMU that might > use these APIs, doesn't care about the order of the struct page, KVM cares about the > max size/level of page it can map into the guest. And similar to the previous patch, > "order" is specific to struct page, which we are trying to avoid. I remembered I suggested return max mapping level instead of order but Kirill reminded me that PG_LEVEL_* is x86 specific, then changed back to 'order'. It's just a matter of backing store or KVM to convert 'order' to mapping level. > > > + *is_private_pfn = true; > > This is where KVM guarantees that is_private_pfn == fault->is_private. > > > + *r = RET_PF_FIXED; > > + return true; > > Ewww. This is super confusing. Ditto for the "*r = -1" magic number. I totally > understand why you took this approach, it's just hard to follow because it kinda > follows the kvm_faultin_pfn() semantics, but then inverts true and false in this > one case. > > I think the least awful option is to forego the helper and open code everything. > If we ever refactor kvm_faultin_pfn() to be less weird then we can maybe move this > to a helper. > > Open coding isn't too bad if you reorganize things so that the exit-to-userspace > path is a dedicated, early check. IMO, it's a lot easier to read this way, open > coded or not. Yes the existing way of handling this is really awful, including the handling for 'r' that will be finally return to KVM_RUN as part of the uAPI. Let me try your above suggestion. > > I think this is correct? "is_private_pfn" and "level" are locals, everything else > is in @fault. > > if (kvm_slot_is_private(slot)) { > is_private_pfn = !!kvm_memfile_get_pfn(slot, fault->gfn, > &fault->pfn, &level); > > if (fault->is_private != is_private_pfn) { > if (is_private_pfn) > kvm_memfile_put_pfn(slot, fault->pfn); > > vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR; > if (fault->is_private) > vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE; > else > vcpu->run->memory.flags = 0; > vcpu->run->memory.padding = 0; > vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT; > vcpu->run->memory.size = PAGE_SIZE; > *r = 0; > return true; > } > > /* > * fault->pfn is all set if the fault is for a private pfn, just > * need to update other metadata. > */ > if (fault->is_private) { > fault->max_level = min(fault->max_level, level); > fault->map_writable = !(slot->flags & KVM_MEM_READONLY); > return false; > } > > /* Fault is shared, fallthrough to the standard path. */ > } > > async = false; > > > @@ -4016,7 +4076,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > > else > > write_lock(&vcpu->kvm->mmu_lock); > > > > - if (is_page_fault_stale(vcpu, fault, mmu_seq)) > > + if (!is_private_pfn && is_page_fault_stale(vcpu, fault, mmu_seq)) > > As above, I'd prefer this check go in is_page_fault_stale(). It means shadow MMUs > will suffer a pointless check, but I don't think that's a big issue. Oooh, unless > we support software-only, which would play nice with nested and probably even legacy > shadow paging. Fun :-) Sounds good. > > > goto out_unlock; > > > > r = make_mmu_pages_available(vcpu); > > @@ -4033,7 +4093,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > > read_unlock(&vcpu->kvm->mmu_lock); > > else > > write_unlock(&vcpu->kvm->mmu_lock); > > - kvm_release_pfn_clean(fault->pfn); > > + > > + if (is_private_pfn) > > And this can be > > if (fault->is_private) > > Same feedback for paging_tmpl.h. Agreed. Thanks, Chao
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3b8da8b0745e..f04c823ea09a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2844,6 +2844,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; + if (kvm_slot_is_private(slot)) + return max_level; + host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot); return min(host_level, max_level); } @@ -3890,7 +3893,59 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); } -static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r) +static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + /* + * At this time private gfn has not been supported yet. Other patch + * that enables it should change this. + */ + return false; +} + +static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault, + bool *is_private_pfn, int *r) +{ + int order; + unsigned int flags = 0; + struct kvm_memory_slot *slot = fault->slot; + long pfn = kvm_memfile_get_pfn(slot, fault->gfn, &order); + + if (kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT)) { + if (pfn < 0) + flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; + else { + fault->pfn = pfn; + if (slot->flags & KVM_MEM_READONLY) + fault->map_writable = false; + else + fault->map_writable = true; + + if (order == 0) + fault->max_level = PG_LEVEL_4K; + *is_private_pfn = true; + *r = RET_PF_FIXED; + return true; + } + } else { + if (pfn < 0) + return false; + + kvm_memfile_put_pfn(slot, pfn); + } + + vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR; + vcpu->run->memory.flags = flags; + vcpu->run->memory.padding = 0; + vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT; + vcpu->run->memory.size = PAGE_SIZE; + fault->pfn = -1; + *r = -1; + return true; +} + +static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, + bool *is_private_pfn, int *r) { struct kvm_memory_slot *slot = fault->slot; bool async; @@ -3924,6 +3979,10 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, } } + if (kvm_slot_is_private(slot) && + kvm_faultin_pfn_private(vcpu, fault, is_private_pfn, r)) + return *r == RET_PF_FIXED ? false : true; + async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, fault->write, &fault->map_writable, @@ -3984,6 +4043,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu); unsigned long mmu_seq; + bool is_private_pfn = false; int r; fault->gfn = fault->addr >> PAGE_SHIFT; @@ -4003,7 +4063,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (kvm_faultin_pfn(vcpu, fault, &r)) + if (kvm_faultin_pfn(vcpu, fault, &is_private_pfn, &r)) return r; if (handle_abnormal_pfn(vcpu, fault, ACC_ALL, &r)) @@ -4016,7 +4076,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault else write_lock(&vcpu->kvm->mmu_lock); - if (is_page_fault_stale(vcpu, fault, mmu_seq)) + if (!is_private_pfn && is_page_fault_stale(vcpu, fault, mmu_seq)) goto out_unlock; r = make_mmu_pages_available(vcpu); @@ -4033,7 +4093,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault read_unlock(&vcpu->kvm->mmu_lock); else write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); + + if (is_private_pfn) + kvm_memfile_put_pfn(fault->slot, fault->pfn); + else + kvm_release_pfn_clean(fault->pfn); + return r; } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 252c77805eb9..6a5736699c0a 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -825,6 +825,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault int r; unsigned long mmu_seq; bool is_self_change_mapping; + bool is_private_pfn = false; + pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code); WARN_ON_ONCE(fault->is_tdp); @@ -873,7 +875,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (kvm_faultin_pfn(vcpu, fault, &r)) + if (kvm_faultin_pfn(vcpu, fault, &is_private_pfn, &r)) return r; if (handle_abnormal_pfn(vcpu, fault, walker.pte_access, &r)) @@ -901,7 +903,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault r = RET_PF_RETRY; write_lock(&vcpu->kvm->mmu_lock); - if (is_page_fault_stale(vcpu, fault, mmu_seq)) + if (!is_private_pfn && is_page_fault_stale(vcpu, fault, mmu_seq)) goto out_unlock; r = make_mmu_pages_available(vcpu); @@ -911,7 +913,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault out_unlock: write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); + if (is_private_pfn) + kvm_memfile_put_pfn(fault->slot, fault->pfn); + else + kvm_release_pfn_clean(fault->pfn); return r; }