diff mbox series

[v13,51/85] KVM: VMX: Use __kvm_faultin_page() to get APIC access page/pfn

Message ID 20241010182427.1434605-52-seanjc@google.com (mailing list archive)
State Handled Elsewhere
Headers show
Series KVM: Stop grabbing references to PFNMAP'd pages | expand

Commit Message

Sean Christopherson Oct. 10, 2024, 6:23 p.m. UTC
Use __kvm_faultin_page() get the APIC access page so that KVM can
precisely release the refcounted page, i.e. to remove yet another user
of kvm_pfn_to_refcounted_page().  While the path isn't handling a guest
page fault, the semantics are effectively the same; KVM just happens to
be mapping the pfn into a VMCS field instead of a secondary MMU.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Yan Zhao Oct. 21, 2024, 10:22 a.m. UTC | #1
On Thu, Oct 10, 2024 at 11:23:53AM -0700, Sean Christopherson wrote:
> Use __kvm_faultin_page() get the APIC access page so that KVM can
> precisely release the refcounted page, i.e. to remove yet another user
> of kvm_pfn_to_refcounted_page().  While the path isn't handling a guest
> page fault, the semantics are effectively the same; KVM just happens to
> be mapping the pfn into a VMCS field instead of a secondary MMU.
> 
> Tested-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 851be0820e04..44cc25dfebba 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6790,8 +6790,10 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_memslots *slots = kvm_memslots(kvm);
>  	struct kvm_memory_slot *slot;
> +	struct page *refcounted_page;
>  	unsigned long mmu_seq;
>  	kvm_pfn_t pfn;
> +	bool writable;
>  
>  	/* Defer reload until vmcs01 is the current VMCS. */
>  	if (is_guest_mode(vcpu)) {
> @@ -6827,7 +6829,7 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
>  	 * controls the APIC-access page memslot, and only deletes the memslot
>  	 * if APICv is permanently inhibited, i.e. the memslot won't reappear.
>  	 */
> -	pfn = gfn_to_pfn_memslot(slot, gfn);
> +	pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, &writable, &refcounted_page);
>  	if (is_error_noslot_pfn(pfn))
>  		return;
>  
> @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
>  		vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
>  
>  	/*
> -	 * Do not pin apic access page in memory, the MMU notifier
> -	 * will call us again if it is migrated or swapped out.
> +	 * Do not pin the APIC access page in memory so that it can be freely
> +	 * migrated, the MMU notifier will call us again if it is migrated or
> +	 * swapped out.  KVM backs the memslot with anonymous memory, the pfn
> +	 * should always point at a refcounted page (if the pfn is valid).
>  	 */
> -	kvm_release_pfn_clean(pfn);
> +	if (!WARN_ON_ONCE(!refcounted_page))
> +		kvm_release_page_clean(refcounted_page);
Why it's not
if (!WARN_ON_ONCE(!refcounted_page)) {
	if (writable)
		kvm_release_page_dirty(refcounted_page)
	else
		kvm_release_page_clean(refcounted_page)
}

or simply not pass "writable" to __kvm_faultin_pfn() as we know the slot is
not read-only and then set dirty ?

if (!WARN_ON_ONCE(!refcounted_page))
	kvm_release_page_dirty(refcounted_page)

>  
>  	/*
>  	 * No need for a manual TLB flush at this point, KVM has already done a
> -- 
> 2.47.0.rc1.288.g06298d1525-goog
>
Sean Christopherson Oct. 21, 2024, 6:57 p.m. UTC | #2
On Mon, Oct 21, 2024, Yan Zhao wrote:
> On Thu, Oct 10, 2024 at 11:23:53AM -0700, Sean Christopherson wrote:
> > Use __kvm_faultin_page() get the APIC access page so that KVM can
> > precisely release the refcounted page, i.e. to remove yet another user
> > of kvm_pfn_to_refcounted_page().  While the path isn't handling a guest
> > page fault, the semantics are effectively the same; KVM just happens to
> > be mapping the pfn into a VMCS field instead of a secondary MMU.
> > 
> > Tested-by: Alex Bennée <alex.bennee@linaro.org>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---

...

> > @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
> >  		vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
> >  
> >  	/*
> > -	 * Do not pin apic access page in memory, the MMU notifier
> > -	 * will call us again if it is migrated or swapped out.
> > +	 * Do not pin the APIC access page in memory so that it can be freely
> > +	 * migrated, the MMU notifier will call us again if it is migrated or
> > +	 * swapped out.  KVM backs the memslot with anonymous memory, the pfn
> > +	 * should always point at a refcounted page (if the pfn is valid).
> >  	 */
> > -	kvm_release_pfn_clean(pfn);
> > +	if (!WARN_ON_ONCE(!refcounted_page))
> > +		kvm_release_page_clean(refcounted_page);
> Why it's not
> if (!WARN_ON_ONCE(!refcounted_page)) {
> 	if (writable)
> 		kvm_release_page_dirty(refcounted_page)
> 	else
> 		kvm_release_page_clean(refcounted_page)
> }
> 
> or simply not pass "writable" to __kvm_faultin_pfn() as we know the slot is
> not read-only and then set dirty ?

__kvm_faultin_pfn() requires a non-NULL @writable.  The intent is to help ensure
the caller is actually checking whether a readable vs. writable mapping was
acquired.  For cases that explicitly pass FOLL_WRITE, it's awkward, but those
should be few and far between.

> if (!WARN_ON_ONCE(!refcounted_page))
> 	kvm_release_page_dirty(refcounted_page)

Ya, this is probably more correct?  Though I would strongly prefer to make any
change in behavior on top of this series.  The use of kvm_release_page_clean()
was added by commit 878940b33d76 ("KVM: VMX: Retry APIC-access page reload if
invalidation is in-progress"), and I suspect the only reason it added the
kvm_set_page_accessed() call is because there was no "unused" variant.  I.e. there
was no concious decision to set Accessed but not Dirty.
Yan Zhao Oct. 22, 2024, 2:15 a.m. UTC | #3
On Mon, Oct 21, 2024 at 11:57:42AM -0700, Sean Christopherson wrote:
> On Mon, Oct 21, 2024, Yan Zhao wrote:
> > On Thu, Oct 10, 2024 at 11:23:53AM -0700, Sean Christopherson wrote:
> > > Use __kvm_faultin_page() get the APIC access page so that KVM can
> > > precisely release the refcounted page, i.e. to remove yet another user
> > > of kvm_pfn_to_refcounted_page().  While the path isn't handling a guest
> > > page fault, the semantics are effectively the same; KVM just happens to
> > > be mapping the pfn into a VMCS field instead of a secondary MMU.
> > > 
> > > Tested-by: Alex Bennée <alex.bennee@linaro.org>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> 
> ...
> 
> > > @@ -6838,10 +6840,13 @@ void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
> > >  		vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
> > >  
> > >  	/*
> > > -	 * Do not pin apic access page in memory, the MMU notifier
> > > -	 * will call us again if it is migrated or swapped out.
> > > +	 * Do not pin the APIC access page in memory so that it can be freely
> > > +	 * migrated, the MMU notifier will call us again if it is migrated or
> > > +	 * swapped out.  KVM backs the memslot with anonymous memory, the pfn
> > > +	 * should always point at a refcounted page (if the pfn is valid).
> > >  	 */
> > > -	kvm_release_pfn_clean(pfn);
> > > +	if (!WARN_ON_ONCE(!refcounted_page))
> > > +		kvm_release_page_clean(refcounted_page);
> > Why it's not
> > if (!WARN_ON_ONCE(!refcounted_page)) {
> > 	if (writable)
> > 		kvm_release_page_dirty(refcounted_page)
> > 	else
> > 		kvm_release_page_clean(refcounted_page)
> > }
> > 
> > or simply not pass "writable" to __kvm_faultin_pfn() as we know the slot is
> > not read-only and then set dirty ?
> 
> __kvm_faultin_pfn() requires a non-NULL @writable.  The intent is to help ensure
Ah, right.

> the caller is actually checking whether a readable vs. writable mapping was
> acquired.  For cases that explicitly pass FOLL_WRITE, it's awkward, but those
> should be few and far between.
Yes, a little weird though nothing wrong in this case by passing "writable"
without checking its value back :)

> 
> > if (!WARN_ON_ONCE(!refcounted_page))
> > 	kvm_release_page_dirty(refcounted_page)
> 
> Ya, this is probably more correct?  Though I would strongly prefer to make any
> change in behavior on top of this series.  The use of kvm_release_page_clean()
> was added by commit 878940b33d76 ("KVM: VMX: Retry APIC-access page reload if
> invalidation is in-progress"), and I suspect the only reason it added the
> kvm_set_page_accessed() call is because there was no "unused" variant.  I.e. there
> was no concious decision to set Accessed but not Dirty.
Thanks for the explanation.
I asked that because I noticed that the policies for setting Accessed or Dirty
for the APIC access page are not aligned between L1 (only Accessed) and L2 (Dirty).
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 851be0820e04..44cc25dfebba 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6790,8 +6790,10 @@  void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_memslots *slots = kvm_memslots(kvm);
 	struct kvm_memory_slot *slot;
+	struct page *refcounted_page;
 	unsigned long mmu_seq;
 	kvm_pfn_t pfn;
+	bool writable;
 
 	/* Defer reload until vmcs01 is the current VMCS. */
 	if (is_guest_mode(vcpu)) {
@@ -6827,7 +6829,7 @@  void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 	 * controls the APIC-access page memslot, and only deletes the memslot
 	 * if APICv is permanently inhibited, i.e. the memslot won't reappear.
 	 */
-	pfn = gfn_to_pfn_memslot(slot, gfn);
+	pfn = __kvm_faultin_pfn(slot, gfn, FOLL_WRITE, &writable, &refcounted_page);
 	if (is_error_noslot_pfn(pfn))
 		return;
 
@@ -6838,10 +6840,13 @@  void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu)
 		vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(pfn));
 
 	/*
-	 * Do not pin apic access page in memory, the MMU notifier
-	 * will call us again if it is migrated or swapped out.
+	 * Do not pin the APIC access page in memory so that it can be freely
+	 * migrated, the MMU notifier will call us again if it is migrated or
+	 * swapped out.  KVM backs the memslot with anonymous memory, the pfn
+	 * should always point at a refcounted page (if the pfn is valid).
 	 */
-	kvm_release_pfn_clean(pfn);
+	if (!WARN_ON_ONCE(!refcounted_page))
+		kvm_release_page_clean(refcounted_page);
 
 	/*
 	 * No need for a manual TLB flush at this point, KVM has already done a