[v2,03/10] KVM: x86/mmu: Extract __kvm_mmu_do_page_fault()

Message ID	ddf1d98420f562707b11e12c416cce8fdb986bb1.1712785629.git.isaku.yamahata@intel.com (mailing list archive)
State	New, archived
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CD6218410C; Wed, 10 Apr 2024 22:07:56 +0000 (UTC) From: isaku.yamahata@intel.com To: kvm@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, linux-kernel@vger.kernel.org, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, Michael Roth <michael.roth@amd.com>, David Matlack <dmatlack@google.com>, Federico Parola <federico.parola@polito.it>, Kai Huang <kai.huang@intel.com> Subject: [PATCH v2 03/10] KVM: x86/mmu: Extract __kvm_mmu_do_page_fault() Date: Wed, 10 Apr 2024 15:07:29 -0700 Message-ID: <ddf1d98420f562707b11e12c416cce8fdb986bb1.1712785629.git.isaku.yamahata@intel.com> In-Reply-To: <cover.1712785629.git.isaku.yamahata@intel.com> References: <cover.1712785629.git.isaku.yamahata@intel.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	KVM: Guest Memory Pre-Population API \| expand [v2,00/10] KVM: Guest Memory Pre-Population API [v2,01/10] KVM: Document KVM_MAP_MEMORY ioctl [v2,02/10] KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory [v2,03/10] KVM: x86/mmu: Extract __kvm_mmu_do_page_fault() [v2,04/10] KVM: x86/mmu: Make __kvm_mmu_do_page_fault() return mapped level [v2,05/10] KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory [v2,06/10] KVM: x86: Implement kvm_arch_vcpu_map_memory() [v2,07/10] KVM: x86: Always populate L1 GPA for KVM_MAP_MEMORY [v2,08/10] KVM: x86: Add a hook in kvm_arch_vcpu_map_memory() [v2,09/10] KVM: SVM: Implement pre_mmu_map_page() to refuse KVM_MAP_MEMORY [v2,10/10] KVM: selftests: x86: Add test for KVM_MAP_MEMORY

Isaku Yamahata April 10, 2024, 10:07 p.m. UTC

From: Isaku Yamahata <isaku.yamahata@intel.com>

Extract out __kvm_mmu_do_page_fault() from kvm_mmu_do_page_fault().  The
inner function is to initialize struct kvm_page_fault and to call the fault
handler, and the outer function handles updating stats and converting
return code.  KVM_MAP_MEMORY will call the KVM page fault handler.

This patch makes the emulation_type always set irrelevant to the return
code.  kvm_mmu_page_fault() is the only caller of kvm_mmu_do_page_fault(),
and references the value only when PF_RET_EMULATE is returned.  Therefore,
this adjustment doesn't affect functionality.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
v2:
- Newly introduced. (Sean)
---
 arch/x86/kvm/mmu/mmu_internal.h | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

Chao Gao April 16, 2024, 8:22 a.m. UTC | #1

>This patch makes the emulation_type always set irrelevant to the return
>code.  kvm_mmu_page_fault() is the only caller of kvm_mmu_do_page_fault(),
>and references the value only when PF_RET_EMULATE is returned.  Therefore,
>this adjustment doesn't affect functionality.

This is benign. But what's the benefit of doing this?

>+static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>+					  u64 err, bool prefetch, int *emulation_type)
> {
> 	struct kvm_page_fault fault = {
> 		.addr = cr2_or_gpa,
>@@ -318,14 +318,6 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> 		fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
> 	}
> 
>-	/*
>-	 * Async #PF "faults", a.k.a. prefetch faults, are not faults from the
>-	 * guest perspective and have already been counted at the time of the
>-	 * original fault.
>-	 */
>-	if (!prefetch)
>-		vcpu->stat.pf_taken++;
>-
> 	if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) && fault.is_tdp)
> 		r = kvm_tdp_page_fault(vcpu, &fault);
> 	else
>@@ -333,12 +325,30 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> 
> 	if (r == RET_PF_EMULATE && fault.is_private) {
> 		kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
>-		return -EFAULT;
>+		r = -EFAULT;
> 	}
> 
> 	if (fault.write_fault_to_shadow_pgtable && emulation_type)
> 		*emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
> 
>+	return r;
>+}
>+
>+static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>+					u64 err, bool prefetch, int *emulation_type)
>+{
>+	int r;
>+
>+	/*
>+	 * Async #PF "faults", a.k.a. prefetch faults, are not faults from the
>+	 * guest perspective and have already been counted at the time of the
>+	 * original fault.
>+	 */
>+	if (!prefetch)
>+		vcpu->stat.pf_taken++;
>+
>+	r = __kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, err, prefetch, emulation_type);

bail out if r < 0?
	
>+
> 	/*
> 	 * Similar to above, prefetch faults aren't truly spurious, and the
> 	 * async #PF path doesn't do emulation.  Do count faults that are fixed
>-- 
>2.43.2
>
>

Rick Edgecombe April 16, 2024, 2:36 p.m. UTC | #2

On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Extract out __kvm_mmu_do_page_fault() from kvm_mmu_do_page_fault().  The
> inner function is to initialize struct kvm_page_fault and to call the fault
> handler, and the outer function handles updating stats and converting
> return code.  KVM_MAP_MEMORY will call the KVM page fault handler.
> 
> This patch makes the emulation_type always set irrelevant to the return
           a comma would help parse this better ^
> code.

>   kvm_mmu_page_fault() is the only caller of kvm_mmu_do_page_fault(),

Not technically correct, there are other callers that pass NULL for
emulation_type.

> and references the value only when PF_RET_EMULATE is returned.  Therefore,
> this adjustment doesn't affect functionality.

Is there a problem with dropping the argument then?

> 
> No functional change intended.

Can we not use the "intended"? It sounds like hedging for excuses.

> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
> v2:
> - Newly introduced. (Sean)
> ---
>  arch/x86/kvm/mmu/mmu_internal.h | 32 +++++++++++++++++++++-----------
>  1 file changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index e68a60974cf4..9baae6c223ee 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -287,8 +287,8 @@ static inline void
> kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
>                                       fault->is_private);
>  }
>  
> -static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t
> cr2_or_gpa,
> -                                       u64 err, bool prefetch, int
> *emulation_type)
> +static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t
> cr2_or_gpa,
> +                                         u64 err, bool prefetch, int
> *emulation_type)
>  {
>         struct kvm_page_fault fault = {
>                 .addr = cr2_or_gpa,
> @@ -318,14 +318,6 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu
> *vcpu, gpa_t cr2_or_gpa,
>                 fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
>         }
>  
> -       /*
> -        * Async #PF "faults", a.k.a. prefetch faults, are not faults from the
> -        * guest perspective and have already been counted at the time of the
> -        * original fault.
> -        */
> -       if (!prefetch)
> -               vcpu->stat.pf_taken++;
> -
>         if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) && fault.is_tdp)
>                 r = kvm_tdp_page_fault(vcpu, &fault);
>         else
> @@ -333,12 +325,30 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu
> *vcpu, gpa_t cr2_or_gpa,
>  
>         if (r == RET_PF_EMULATE && fault.is_private) {
>                 kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
> -               return -EFAULT;
> +               r = -EFAULT;
>         }
>  
>         if (fault.write_fault_to_shadow_pgtable && emulation_type)
>                 *emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
>  
> +       return r;
> +}
> +
> +static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t
> cr2_or_gpa,
> +                                       u64 err, bool prefetch, int
> *emulation_type)
> +{
> +       int r;
> +
> +       /*
> +        * Async #PF "faults", a.k.a. prefetch faults, are not faults from the
> +        * guest perspective and have already been counted at the time of the
> +        * original fault.
> +        */
> +       if (!prefetch)
> +               vcpu->stat.pf_taken++;

From the name, it makes sense to not count KVM_MAP_MEMORY as a pf_taken. But
kvm_arch_async_page_ready() increments it as well. Which makes it more like a
"faulted-in" count. I think the code in this patch is ok.

> +
> +       r = __kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, err, prefetch,
> emulation_type);
> +
>         /*
>          * Similar to above, prefetch faults aren't truly spurious, and the
>          * async #PF path doesn't do emulation.  Do count faults that are
> fixed

Isaku Yamahata April 16, 2024, 11:43 p.m. UTC | #3

On Tue, Apr 16, 2024 at 04:22:35PM +0800,
Chao Gao <chao.gao@intel.com> wrote:

> 
> >This patch makes the emulation_type always set irrelevant to the return
> >code.  kvm_mmu_page_fault() is the only caller of kvm_mmu_do_page_fault(),
> >and references the value only when PF_RET_EMULATE is returned.  Therefore,
> >this adjustment doesn't affect functionality.
>
> This is benign. But what's the benefit of doing this?

To avoid increment vcpu->stat.  Because originally this was VM ioctl, I wanted
to avoid touch vCPU stat.  Now it's vCPU ioctl, it's fine to increment them.

Probably we can drop this patch and use kvm_mmu_do_page_fault().

> 
> >+static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >+					  u64 err, bool prefetch, int *emulation_type)
> > {
> > 	struct kvm_page_fault fault = {
> > 		.addr = cr2_or_gpa,
> >@@ -318,14 +318,6 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > 		fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn);
> > 	}
> > 
> >-	/*
> >-	 * Async #PF "faults", a.k.a. prefetch faults, are not faults from the
> >-	 * guest perspective and have already been counted at the time of the
> >-	 * original fault.
> >-	 */
> >-	if (!prefetch)
> >-		vcpu->stat.pf_taken++;
> >-
> > 	if (IS_ENABLED(CONFIG_MITIGATION_RETPOLINE) && fault.is_tdp)
> > 		r = kvm_tdp_page_fault(vcpu, &fault);
> > 	else
> >@@ -333,12 +325,30 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > 
> > 	if (r == RET_PF_EMULATE && fault.is_private) {
> > 		kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
> >-		return -EFAULT;
> >+		r = -EFAULT;
> > 	}
> > 
> > 	if (fault.write_fault_to_shadow_pgtable && emulation_type)
> > 		*emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
> > 
> >+	return r;
> >+}
> >+
> >+static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >+					u64 err, bool prefetch, int *emulation_type)
> >+{
> >+	int r;
> >+
> >+	/*
> >+	 * Async #PF "faults", a.k.a. prefetch faults, are not faults from the
> >+	 * guest perspective and have already been counted at the time of the
> >+	 * original fault.
> >+	 */
> >+	if (!prefetch)
> >+		vcpu->stat.pf_taken++;
> >+
> >+	r = __kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, err, prefetch, emulation_type);
> 
> bail out if r < 0?

The following if clauses checks RET_PF_xxx > 0.

Isaku Yamahata April 16, 2024, 11:52 p.m. UTC | #4

On Tue, Apr 16, 2024 at 02:36:31PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> > Extract out __kvm_mmu_do_page_fault() from kvm_mmu_do_page_fault().  The
> > inner function is to initialize struct kvm_page_fault and to call the fault
> > handler, and the outer function handles updating stats and converting
> > return code.  KVM_MAP_MEMORY will call the KVM page fault handler.
> > 
> > This patch makes the emulation_type always set irrelevant to the return
>            a comma would help parse this better ^
> > code.
> 
> >   kvm_mmu_page_fault() is the only caller of kvm_mmu_do_page_fault(),
> 
> Not technically correct, there are other callers that pass NULL for
> emulation_type.
> 
> > and references the value only when PF_RET_EMULATE is returned.  Therefore,
> > this adjustment doesn't affect functionality.
> 
> Is there a problem with dropping the argument then?
> 
> > 
> > No functional change intended.
> 
> Can we not use the "intended"? It sounds like hedging for excuses.

Thanks for review.
As Chao pointed out, this patch is unnecessary.  I'll use
kvm_mmu_do_page_fault() directly with updating vcpu->stat.

https://lore.kernel.org/all/20240416234334.GA3039520@ls.amr.corp.intel.com/

Paolo Bonzini April 17, 2024, 3:41 p.m. UTC | #5

On Wed, Apr 17, 2024 at 1:52 AM Isaku Yamahata <isaku.yamahata@intel.com> wrote:
> As Chao pointed out, this patch is unnecessary.  I'll use
> kvm_mmu_do_page_fault() directly with updating vcpu->stat.

Actually I prefer to have this patch.

pf_* stats do not make sense for pre-population, and updating them
confuses things because pre-population (outside TDX) has the purpose
of avoiding page faults.

Paolo

[v2,03/10] KVM: x86/mmu: Extract __kvm_mmu_do_page_fault()

Commit Message

Comments

Patch