Message ID | 20241010182427.1434605-52-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Stop grabbing references to PFNMAP'd pages | expand |
On Thu, Oct 10, 2024 at 11:23:53AM -0700, Sean Christopherson wrote: > Use __kvm_faultin_page() get the APIC access page so that KVM can > precisely release the refcounted page, i.e. to remove yet another user > of kvm_pfn_to_refcounted_page(). While the path isn't handling a guest > page fault, the semantics are effectively the same; KVM just happens to > be mapping the pfn into a VMCS field instead of a secondary MMU. > > Tested-by: Alex Bennée <alex.bennee@linaro.org> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/vmx/vmx.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 851be0820e04..44cc25dfebba 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -6790,8 +6790,10 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) > struct kvm *kvm = vcpu->kvm; > struct kvm_memslots *slots = kvm_memslots(kvm); > struct kvm_memory_slot *slot; > + struct page *refcounted_page; > unsigned long mmu_seq; > kvm_pfn_t pfn; > + bool writable; > > /* Defer reload until vmcs01 is the current VMCS. */ > if (is_guest_mode(vcpu)) { > @@ -6827,7 +6829,7 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) > * controls the APIC-access page memslot, and only deletes the memslot > * if APICv is permanently inhibited, i.e. the memslot won't reappear. > */ > - pfn = gfn_to_pfn_memslot(slot, gfn); > + pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, &writable, &refcounted_page); > if (is_error_noslot_pfn(pfn)) > return; > > @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) > vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn)); > > /* > - * Do not pin apic access page in memory, the MMU notifier > - * will call us again if it is migrated or swapped out. > + * Do not pin the APIC access page in memory so that it can be freely > + * migrated, the MMU notifier will call us again if it is migrated or > + * swapped out. KVM backs the memslot with anonymous memory, the pfn > + * should always point at a refcounted page (if the pfn is valid). > */ > - kvm_release_pfn_clean(pfn); > + if (!WARN_ON_ONCE(!refcounted_page)) > + kvm_release_page_clean(refcounted_page); Why it's not if (!WARN_ON_ONCE(!refcounted_page)) { if (writable) kvm_release_page_dirty(refcounted_page) else kvm_release_page_clean(refcounted_page) } or simply not pass "writable" to __kvm_faultin_pfn() as we know the slot is not read-only and then set dirty ? if (!WARN_ON_ONCE(!refcounted_page)) kvm_release_page_dirty(refcounted_page) > > /* > * No need for a manual TLB flush at this point, KVM has already done a > -- > 2.47.0.rc1.288.g06298d1525-goog >
On Mon, Oct 21, 2024, Yan Zhao wrote: > On Thu, Oct 10, 2024 at 11:23:53AM -0700, Sean Christopherson wrote: > > Use __kvm_faultin_page() get the APIC access page so that KVM can > > precisely release the refcounted page, i.e. to remove yet another user > > of kvm_pfn_to_refcounted_page(). While the path isn't handling a guest > > page fault, the semantics are effectively the same; KVM just happens to > > be mapping the pfn into a VMCS field instead of a secondary MMU. > > > > Tested-by: Alex Bennée <alex.bennee@linaro.org> > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- ... > > @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) > > vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn)); > > > > /* > > - * Do not pin apic access page in memory, the MMU notifier > > - * will call us again if it is migrated or swapped out. > > + * Do not pin the APIC access page in memory so that it can be freely > > + * migrated, the MMU notifier will call us again if it is migrated or > > + * swapped out. KVM backs the memslot with anonymous memory, the pfn > > + * should always point at a refcounted page (if the pfn is valid). > > */ > > - kvm_release_pfn_clean(pfn); > > + if (!WARN_ON_ONCE(!refcounted_page)) > > + kvm_release_page_clean(refcounted_page); > Why it's not > if (!WARN_ON_ONCE(!refcounted_page)) { > if (writable) > kvm_release_page_dirty(refcounted_page) > else > kvm_release_page_clean(refcounted_page) > } > > or simply not pass "writable" to __kvm_faultin_pfn() as we know the slot is > not read-only and then set dirty ? __kvm_faultin_pfn() requires a non-NULL @writable. The intent is to help ensure the caller is actually checking whether a readable vs. writable mapping was acquired. For cases that explicitly pass FOLL_WRITE, it's awkward, but those should be few and far between. > if (!WARN_ON_ONCE(!refcounted_page)) > kvm_release_page_dirty(refcounted_page) Ya, this is probably more correct? Though I would strongly prefer to make any change in behavior on top of this series. The use of kvm_release_page_clean() was added by commit 878940b33d76 ("KVM: VMX: Retry APIC-access page reload if invalidation is in-progress"), and I suspect the only reason it added the kvm_set_page_accessed() call is because there was no "unused" variant. I.e. there was no concious decision to set Accessed but not Dirty.
On Mon, Oct 21, 2024 at 11:57:42AM -0700, Sean Christopherson wrote: > On Mon, Oct 21, 2024, Yan Zhao wrote: > > On Thu, Oct 10, 2024 at 11:23:53AM -0700, Sean Christopherson wrote: > > > Use __kvm_faultin_page() get the APIC access page so that KVM can > > > precisely release the refcounted page, i.e. to remove yet another user > > > of kvm_pfn_to_refcounted_page(). While the path isn't handling a guest > > > page fault, the semantics are effectively the same; KVM just happens to > > > be mapping the pfn into a VMCS field instead of a secondary MMU. > > > > > > Tested-by: Alex Bennée <alex.bennee@linaro.org> > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > > --- > > ... > > > > @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) > > > vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn)); > > > > > > /* > > > - * Do not pin apic access page in memory, the MMU notifier > > > - * will call us again if it is migrated or swapped out. > > > + * Do not pin the APIC access page in memory so that it can be freely > > > + * migrated, the MMU notifier will call us again if it is migrated or > > > + * swapped out. KVM backs the memslot with anonymous memory, the pfn > > > + * should always point at a refcounted page (if the pfn is valid). > > > */ > > > - kvm_release_pfn_clean(pfn); > > > + if (!WARN_ON_ONCE(!refcounted_page)) > > > + kvm_release_page_clean(refcounted_page); > > Why it's not > > if (!WARN_ON_ONCE(!refcounted_page)) { > > if (writable) > > kvm_release_page_dirty(refcounted_page) > > else > > kvm_release_page_clean(refcounted_page) > > } > > > > or simply not pass "writable" to __kvm_faultin_pfn() as we know the slot is > > not read-only and then set dirty ? > > __kvm_faultin_pfn() requires a non-NULL @writable. The intent is to help ensure Ah, right. > the caller is actually checking whether a readable vs. writable mapping was > acquired. For cases that explicitly pass FOLL_WRITE, it's awkward, but those > should be few and far between. Yes, a little weird though nothing wrong in this case by passing "writable" without checking its value back :) > > > if (!WARN_ON_ONCE(!refcounted_page)) > > kvm_release_page_dirty(refcounted_page) > > Ya, this is probably more correct? Though I would strongly prefer to make any > change in behavior on top of this series. The use of kvm_release_page_clean() > was added by commit 878940b33d76 ("KVM: VMX: Retry APIC-access page reload if > invalidation is in-progress"), and I suspect the only reason it added the > kvm_set_page_accessed() call is because there was no "unused" variant. I.e. there > was no concious decision to set Accessed but not Dirty. Thanks for the explanation. I asked that because I noticed that the policies for setting Accessed or Dirty for the APIC access page are not aligned between L1 (only Accessed) and L2 (Dirty).
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 851be0820e04..44cc25dfebba 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6790,8 +6790,10 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) struct kvm *kvm = vcpu->kvm; struct kvm_memslots *slots = kvm_memslots(kvm); struct kvm_memory_slot *slot; + struct page *refcounted_page; unsigned long mmu_seq; kvm_pfn_t pfn; + bool writable; /* Defer reload until vmcs01 is the current VMCS. */ if (is_guest_mode(vcpu)) { @@ -6827,7 +6829,7 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) * controls the APIC-access page memslot, and only deletes the memslot * if APICv is permanently inhibited, i.e. the memslot won't reappear. */ - pfn = gfn_to_pfn_memslot(slot, gfn); + pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, &writable, &refcounted_page); if (is_error_noslot_pfn(pfn)) return; @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn)); /* - * Do not pin apic access page in memory, the MMU notifier - * will call us again if it is migrated or swapped out. + * Do not pin the APIC access page in memory so that it can be freely + * migrated, the MMU notifier will call us again if it is migrated or + * swapped out. KVM backs the memslot with anonymous memory, the pfn + * should always point at a refcounted page (if the pfn is valid). */ - kvm_release_pfn_clean(pfn); + if (!WARN_ON_ONCE(!refcounted_page)) + kvm_release_page_clean(refcounted_page); /* * No need for a manual TLB flush at this point, KVM has already done a