diff mbox series

[v2,1/6] KVM: x86/mmu: Fix wrong gfn range of tlb flushing in validate_direct_spte()

Message ID c0ee12e44f2d218a0857a5e05628d05462b32bf9.1661331396.git.houwenlong.hwl@antgroup.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/mmu: Fix wrong usages of range-based tlb flushing | expand

Commit Message

Hou Wenlong Aug. 24, 2022, 9:29 a.m. UTC
The spte pointing to the children SP is dropped, so the
whole gfn range covered by the children SP should be flushed.
Although, Hyper-V may treat a 1-page flush the same if the
address points to a huge page, it still would be better
to use the correct size of huge page. Also introduce
a helper function to do range-based flushing when a direct
SP is dropped, which would help prevent future buggy use
of kvm_flush_remote_tlbs_with_address() in such case.

Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.")
Suggested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
---
 arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

David Matlack Sept. 7, 2022, 5:43 p.m. UTC | #1
On Wed, Aug 24, 2022 at 05:29:18PM +0800, Hou Wenlong wrote:
> The spte pointing to the children SP is dropped, so the
> whole gfn range covered by the children SP should be flushed.
> Although, Hyper-V may treat a 1-page flush the same if the
> address points to a huge page, it still would be better
> to use the correct size of huge page. Also introduce
> a helper function to do range-based flushing when a direct
> SP is dropped, which would help prevent future buggy use
> of kvm_flush_remote_tlbs_with_address() in such case.
> 
> Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.")
> Suggested-by: David Matlack <dmatlack@google.com>
> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e418ef3ecfcb..a3578abd8bbc 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -260,6 +260,14 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
>  	kvm_flush_remote_tlbs_with_range(kvm, &range);
>  }
>  
> +/* Flush all memory mapped by the given direct SP. */
> +static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> +{
> +	WARN_ON_ONCE(!sp->role.direct);
> +	kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> +					   KVM_PAGES_PER_HPAGE(sp->role.level + 1));

nit: I think it would make sense to introduce
kvm_flush_remote_tlbs_gfn() in this patch since you are going to
eventually use it here anyway.

> +}
> +
>  static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
>  			   unsigned int access)
>  {
> @@ -2341,7 +2349,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  			return;
>  
>  		drop_parent_pte(child, sptep);
> -		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
> +		kvm_flush_remote_tlbs_direct_sp(vcpu->kvm, child);
>  	}
>  }
>  
> -- 
> 2.31.1
>
Hou Wenlong Sept. 13, 2022, 12:07 p.m. UTC | #2
On Thu, Sep 08, 2022 at 01:43:54AM +0800, David Matlack wrote:
> On Wed, Aug 24, 2022 at 05:29:18PM +0800, Hou Wenlong wrote:
> > The spte pointing to the children SP is dropped, so the
> > whole gfn range covered by the children SP should be flushed.
> > Although, Hyper-V may treat a 1-page flush the same if the
> > address points to a huge page, it still would be better
> > to use the correct size of huge page. Also introduce
> > a helper function to do range-based flushing when a direct
> > SP is dropped, which would help prevent future buggy use
> > of kvm_flush_remote_tlbs_with_address() in such case.
> > 
> > Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.")
> > Suggested-by: David Matlack <dmatlack@google.com>
> > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index e418ef3ecfcb..a3578abd8bbc 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -260,6 +260,14 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
> >  	kvm_flush_remote_tlbs_with_range(kvm, &range);
> >  }
> >  
> > +/* Flush all memory mapped by the given direct SP. */
> > +static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> > +{
> > +	WARN_ON_ONCE(!sp->role.direct);
> > +	kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> > +					   KVM_PAGES_PER_HPAGE(sp->role.level + 1));
> 
> nit: I think it would make sense to introduce
> kvm_flush_remote_tlbs_gfn() in this patch since you are going to
> eventually use it here anyway.
>
OK, I'll do it in the next version. Thanks!
 
> > +}
> > +
> >  static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
> >  			   unsigned int access)
> >  {
> > @@ -2341,7 +2349,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >  			return;
> >  
> >  		drop_parent_pte(child, sptep);
> > -		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
> > +		kvm_flush_remote_tlbs_direct_sp(vcpu->kvm, child);
> >  	}
> >  }
> >  
> > -- 
> > 2.31.1
> >
Liam Ni Sept. 15, 2022, 11:47 a.m. UTC | #3
On Tue, 13 Sept 2022 at 20:13, Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:
>
> On Thu, Sep 08, 2022 at 01:43:54AM +0800, David Matlack wrote:
> > On Wed, Aug 24, 2022 at 05:29:18PM +0800, Hou Wenlong wrote:
> > > The spte pointing to the children SP is dropped, so the
> > > whole gfn range covered by the children SP should be flushed.
> > > Although, Hyper-V may treat a 1-page flush the same if the
> > > address points to a huge page, it still would be better
> > > to use the correct size of huge page. Also introduce
> > > a helper function to do range-based flushing when a direct
> > > SP is dropped, which would help prevent future buggy use
> > > of kvm_flush_remote_tlbs_with_address() in such case.
> > >
> > > Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.")
> > > Suggested-by: David Matlack <dmatlack@google.com>
> > > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index e418ef3ecfcb..a3578abd8bbc 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -260,6 +260,14 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
> > >     kvm_flush_remote_tlbs_with_range(kvm, &range);
> > >  }
> > >
> > > +/* Flush all memory mapped by the given direct SP. */
> > > +static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> > > +{
> > > +   WARN_ON_ONCE(!sp->role.direct);
> > > +   kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> > > +                                      KVM_PAGES_PER_HPAGE(sp->role.level + 1));

Do we need "+1" here?    sp->role.level=1 means 4k page.
I think here should be   “KVM_PAGES_PER_HPAGE(sp->role.level)”

> >
> > nit: I think it would make sense to introduce
> > kvm_flush_remote_tlbs_gfn() in this patch since you are going to
> > eventually use it here anyway.
> >
> OK, I'll do it in the next version. Thanks!
>
> > > +}
> > > +
> > >  static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
> > >                        unsigned int access)
> > >  {
> > > @@ -2341,7 +2349,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> > >                     return;
> > >
> > >             drop_parent_pte(child, sptep);
> > > -           kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
> > > +           kvm_flush_remote_tlbs_direct_sp(vcpu->kvm, child);
> > >     }
> > >  }
> > >
> > > --
> > > 2.31.1
> > >
Hou Wenlong Sept. 16, 2022, 2:49 a.m. UTC | #4
On Thu, Sep 15, 2022 at 07:47:35PM +0800, Liam Ni wrote:
> On Tue, 13 Sept 2022 at 20:13, Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:
> >
> > On Thu, Sep 08, 2022 at 01:43:54AM +0800, David Matlack wrote:
> > > On Wed, Aug 24, 2022 at 05:29:18PM +0800, Hou Wenlong wrote:
> > > > The spte pointing to the children SP is dropped, so the
> > > > whole gfn range covered by the children SP should be flushed.
> > > > Although, Hyper-V may treat a 1-page flush the same if the
> > > > address points to a huge page, it still would be better
> > > > to use the correct size of huge page. Also introduce
> > > > a helper function to do range-based flushing when a direct
> > > > SP is dropped, which would help prevent future buggy use
> > > > of kvm_flush_remote_tlbs_with_address() in such case.
> > > >
> > > > Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.")
> > > > Suggested-by: David Matlack <dmatlack@google.com>
> > > > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> > > > ---
> > > >  arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
> > > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index e418ef3ecfcb..a3578abd8bbc 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -260,6 +260,14 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
> > > >     kvm_flush_remote_tlbs_with_range(kvm, &range);
> > > >  }
> > > >
> > > > +/* Flush all memory mapped by the given direct SP. */
> > > > +static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> > > > +{
> > > > +   WARN_ON_ONCE(!sp->role.direct);
> > > > +   kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> > > > +                                      KVM_PAGES_PER_HPAGE(sp->role.level + 1));
> 
> Do we need "+1" here?    sp->role.level=1 means 4k page.
> I think here should be   “KVM_PAGES_PER_HPAGE(sp->role.level)”
>
This helper function is used to flush all memory mapped by the given
SP, not one spte in the SP. If sp->role.level=1, the size of memory
mapped by the SP is 2M, but KVM_PAGES_PER_HPAGE(sp->role.level) is
1.

> > >
> > > nit: I think it would make sense to introduce
> > > kvm_flush_remote_tlbs_gfn() in this patch since you are going to
> > > eventually use it here anyway.
> > >
> > OK, I'll do it in the next version. Thanks!
> >
> > > > +}
> > > > +
> > > >  static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
> > > >                        unsigned int access)
> > > >  {
> > > > @@ -2341,7 +2349,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> > > >                     return;
> > > >
> > > >             drop_parent_pte(child, sptep);
> > > > -           kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
> > > > +           kvm_flush_remote_tlbs_direct_sp(vcpu->kvm, child);
> > > >     }
> > > >  }
> > > >
> > > > --
> > > > 2.31.1
> > > >
Robert Hoo Sept. 18, 2022, 1:11 p.m. UTC | #5
On Wed, 2022-08-24 at 17:29 +0800, Hou Wenlong wrote:
> The spte pointing to the children SP is dropped, so the
> whole gfn range covered by the children SP should be flushed.
> Although, Hyper-V may treat a 1-page flush the same if the
> address points to a huge page, it still would be better
> to use the correct size of huge page. Also introduce
> a helper function to do range-based flushing when a direct
> SP is dropped, which would help prevent future buggy use
> of kvm_flush_remote_tlbs_with_address() in such case.
> 
> Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new
> one to flush a specified range.")
> Suggested-by: David Matlack <dmatlack@google.com>
> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e418ef3ecfcb..a3578abd8bbc 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -260,6 +260,14 @@ void kvm_flush_remote_tlbs_with_address(struct
> kvm *kvm,
>  	kvm_flush_remote_tlbs_with_range(kvm, &range);
>  }
>  
> +/* Flush all memory mapped by the given direct SP. */
> +static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct
> kvm_mmu_page *sp)
> +{
> +	WARN_ON_ONCE(!sp->role.direct);

What if !sp->role.direct? Below flushing sp->gfn isn't expected? but
still to do it. Is this operation harmless? 

> +	kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> +					   KVM_PAGES_PER_HPAGE(sp-
> >role.level + 1));
> +}
> +
>  static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64
> gfn,
>  			   unsigned int access)
>  {
> @@ -2341,7 +2349,7 @@ static void validate_direct_spte(struct
> kvm_vcpu *vcpu, u64 *sptep,
>  			return;
>  
>  		drop_parent_pte(child, sptep);
> -		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child-
> >gfn, 1);
> +		kvm_flush_remote_tlbs_direct_sp(vcpu->kvm, child);
>  	}
>  }
>
David Matlack Sept. 20, 2022, 6:08 p.m. UTC | #6
On Thu, Sep 15, 2022 at 4:47 AM Liam Ni <zhiguangni01@gmail.com> wrote:
>
> On Tue, 13 Sept 2022 at 20:13, Hou Wenlong <houwenlong.hwl@antgroup.com> wrote:
> >
> > On Thu, Sep 08, 2022 at 01:43:54AM +0800, David Matlack wrote:
> > > On Wed, Aug 24, 2022 at 05:29:18PM +0800, Hou Wenlong wrote:
> > > > The spte pointing to the children SP is dropped, so the
> > > > whole gfn range covered by the children SP should be flushed.
> > > > Although, Hyper-V may treat a 1-page flush the same if the
> > > > address points to a huge page, it still would be better
> > > > to use the correct size of huge page. Also introduce
> > > > a helper function to do range-based flushing when a direct
> > > > SP is dropped, which would help prevent future buggy use
> > > > of kvm_flush_remote_tlbs_with_address() in such case.
> > > >
> > > > Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.")
> > > > Suggested-by: David Matlack <dmatlack@google.com>
> > > > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> > > > ---
> > > >  arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
> > > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index e418ef3ecfcb..a3578abd8bbc 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -260,6 +260,14 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
> > > >     kvm_flush_remote_tlbs_with_range(kvm, &range);
> > > >  }
> > > >
> > > > +/* Flush all memory mapped by the given direct SP. */
> > > > +static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> > > > +{
> > > > +   WARN_ON_ONCE(!sp->role.direct);
> > > > +   kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> > > > +                                      KVM_PAGES_PER_HPAGE(sp->role.level + 1));
>
> Do we need "+1" here?    sp->role.level=1 means 4k page.
> I think here should be   “KVM_PAGES_PER_HPAGE(sp->role.level)”

Yes we need the "+ 1" here. kvm_flush_remote_tlbs_direct_sp() must
flush all memory mapped by the shadow page, which is equivalent to the
amount of memory mapped by a huge page at the next higher level. For
example, a shadow page with role.level == PG_LEVEL_4K maps 2 MiB of
the guest physical address space since 512 PTEs x 4KiB per PTE = 2MiB.

>
> > >
> > > nit: I think it would make sense to introduce
> > > kvm_flush_remote_tlbs_gfn() in this patch since you are going to
> > > eventually use it here anyway.
> > >
> > OK, I'll do it in the next version. Thanks!
> >
> > > > +}
> > > > +
> > > >  static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
> > > >                        unsigned int access)
> > > >  {
> > > > @@ -2341,7 +2349,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> > > >                     return;
> > > >
> > > >             drop_parent_pte(child, sptep);
> > > > -           kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
> > > > +           kvm_flush_remote_tlbs_direct_sp(vcpu->kvm, child);
> > > >     }
> > > >  }
> > > >
> > > > --
> > > > 2.31.1
> > > >
David Matlack Sept. 20, 2022, 6:32 p.m. UTC | #7
On Sun, Sep 18, 2022 at 09:11:00PM +0800, Robert Hoo wrote:
> On Wed, 2022-08-24 at 17:29 +0800, Hou Wenlong wrote:
> > The spte pointing to the children SP is dropped, so the
> > whole gfn range covered by the children SP should be flushed.
> > Although, Hyper-V may treat a 1-page flush the same if the
> > address points to a huge page, it still would be better
> > to use the correct size of huge page. Also introduce
> > a helper function to do range-based flushing when a direct
> > SP is dropped, which would help prevent future buggy use
> > of kvm_flush_remote_tlbs_with_address() in such case.
> > 
> > Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new
> > one to flush a specified range.")
> > Suggested-by: David Matlack <dmatlack@google.com>
> > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index e418ef3ecfcb..a3578abd8bbc 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -260,6 +260,14 @@ void kvm_flush_remote_tlbs_with_address(struct
> > kvm *kvm,
> >  	kvm_flush_remote_tlbs_with_range(kvm, &range);
> >  }
> >  
> > +/* Flush all memory mapped by the given direct SP. */
> > +static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct
> > kvm_mmu_page *sp)
> > +{
> > +	WARN_ON_ONCE(!sp->role.direct);
> 
> What if !sp->role.direct? Below flushing sp->gfn isn't expected? but
> still to do it. Is this operation harmless? 

Flushing TLBs is always harmless because KVM cannot ever assume an entry is
in the TLB. However, *not* (properly) flushing TLBs can be harmful. If KVM ever
calls kvm_flush_remote_tlbs_direct_sp() with an indirect SP, that is a bug in
KVM. The TLB flush here won't be harmful, as I explained, but KVM will miss a
TLB flush.

That being said, I don't think any changes here are necessary.
kvm_flush_remote_tlbs_direct_sp() only has one caller, validate_direct_spte(),
which only operates on direct SPs. The name of the function also makes it
obvious this should only be called with a direct SP. And if we ever mess this
up in the future, we'll see the WARN_ON().
David Matlack Sept. 20, 2022, 6:44 p.m. UTC | #8
On Tue, Sep 20, 2022 at 11:32 AM David Matlack <dmatlack@google.com> wrote:
>
> On Sun, Sep 18, 2022 at 09:11:00PM +0800, Robert Hoo wrote:
> > On Wed, 2022-08-24 at 17:29 +0800, Hou Wenlong wrote:
> > > The spte pointing to the children SP is dropped, so the
> > > whole gfn range covered by the children SP should be flushed.
> > > Although, Hyper-V may treat a 1-page flush the same if the
> > > address points to a huge page, it still would be better
> > > to use the correct size of huge page. Also introduce
> > > a helper function to do range-based flushing when a direct
> > > SP is dropped, which would help prevent future buggy use
> > > of kvm_flush_remote_tlbs_with_address() in such case.
> > >
> > > Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new
> > > one to flush a specified range.")
> > > Suggested-by: David Matlack <dmatlack@google.com>
> > > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> > > ---
> > >  arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index e418ef3ecfcb..a3578abd8bbc 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -260,6 +260,14 @@ void kvm_flush_remote_tlbs_with_address(struct
> > > kvm *kvm,
> > >     kvm_flush_remote_tlbs_with_range(kvm, &range);
> > >  }
> > >
> > > +/* Flush all memory mapped by the given direct SP. */
> > > +static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct
> > > kvm_mmu_page *sp)
> > > +{
> > > +   WARN_ON_ONCE(!sp->role.direct);
> >
> > What if !sp->role.direct? Below flushing sp->gfn isn't expected? but
> > still to do it. Is this operation harmless?
>
> Flushing TLBs is always harmless because KVM cannot ever assume an entry is
> in the TLB. However, *not* (properly) flushing TLBs can be harmful. If KVM ever
> calls kvm_flush_remote_tlbs_direct_sp() with an indirect SP, that is a bug in
> KVM. The TLB flush here won't be harmful, as I explained, but KVM will miss a
> TLB flush.
>
> That being said, I don't think any changes here are necessary.
> kvm_flush_remote_tlbs_direct_sp() only has one caller, validate_direct_spte(),
> which only operates on direct SPs. The name of the function also makes it
> obvious this should only be called with a direct SP. And if we ever mess this
> up in the future, we'll see the WARN_ON().

That being said, we might as well replace the WARN_ON_ONCE() with
KVM_BUG_ON(). That will still do a WARN_ON_ONCE() but has the added
benefit of terminating the VM.
Robert Hoo Sept. 27, 2022, 2:54 a.m. UTC | #9
On Tue, 2022-09-20 at 11:44 -0700, David Matlack wrote:
> On Tue, Sep 20, 2022 at 11:32 AM David Matlack <dmatlack@google.com>
> wrote:
> > 
> > On Sun, Sep 18, 2022 at 09:11:00PM +0800, Robert Hoo wrote:
> > > On Wed, 2022-08-24 at 17:29 +0800, Hou Wenlong wrote:
> > > > The spte pointing to the children SP is dropped, so the
> > > > whole gfn range covered by the children SP should be flushed.
> > > > Although, Hyper-V may treat a 1-page flush the same if the
> > > > address points to a huge page, it still would be better
> > > > to use the correct size of huge page. Also introduce
> > > > a helper function to do range-based flushing when a direct
> > > > SP is dropped, which would help prevent future buggy use
> > > > of kvm_flush_remote_tlbs_with_address() in such case.
> > > > 
> > > > Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with
> > > > new
> > > > one to flush a specified range.")
> > > > Suggested-by: David Matlack <dmatlack@google.com>
> > > > Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
> > > > ---
> > > >  arch/x86/kvm/mmu/mmu.c | 10 +++++++++-
> > > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index e418ef3ecfcb..a3578abd8bbc 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -260,6 +260,14 @@ void
> > > > kvm_flush_remote_tlbs_with_address(struct
> > > > kvm *kvm,
> > > >     kvm_flush_remote_tlbs_with_range(kvm, &range);
> > > >  }
> > > > 
> > > > +/* Flush all memory mapped by the given direct SP. */
> > > > +static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm,
> > > > struct
> > > > kvm_mmu_page *sp)
> > > > +{
> > > > +   WARN_ON_ONCE(!sp->role.direct);
> > > 
> > > What if !sp->role.direct? Below flushing sp->gfn isn't expected?
> > > but
> > > still to do it. Is this operation harmless?
> > 
> > Flushing TLBs is always harmless because KVM cannot ever assume an
> > entry is
> > in the TLB. However, *not* (properly) flushing TLBs can be harmful.
> > If KVM ever
> > calls kvm_flush_remote_tlbs_direct_sp() with an indirect SP, that
> > is a bug in
> > KVM. The TLB flush here won't be harmful, as I explained, but KVM
> > will miss a
> > TLB flush.
> > 
Yes, agree, not harmful, a cost of TLB miss, thanks.

> > That being said, I don't think any changes here are necessary.
> > kvm_flush_remote_tlbs_direct_sp() only has one caller,
> > validate_direct_spte(),
> > which only operates on direct SPs. The name of the function also
> > makes it
> > obvious this should only be called with a direct SP. And if we ever
> > mess this
> > up in the future, we'll see the WARN_ON().
> 
> That being said, we might as well replace the WARN_ON_ONCE() with
> KVM_BUG_ON(). That will still do a WARN_ON_ONCE() but has the added
> benefit of terminating the VM.

Yeah, here was my point, WARN_ON_ONCE() might not be warning obviously
enough, as it usually for recoverable cases. But terminating VM is also
over action I think. Just my 2 cents, the whole patch is good.
Sean Christopherson Sept. 27, 2022, 4:44 p.m. UTC | #10
On Tue, Sep 27, 2022, Robert Hoo wrote:
> On Tue, 2022-09-20 at 11:44 -0700, David Matlack wrote:
> > That being said, we might as well replace the WARN_ON_ONCE() with
> > KVM_BUG_ON(). That will still do a WARN_ON_ONCE() but has the added
> > benefit of terminating the VM.
> 
> Yeah, here was my point, WARN_ON_ONCE() might not be warning obviously
> enough, as it usually for recoverable cases. But terminating VM is also
> over action I think.

Agreed, from the helper's perspective, killing the VM is unnecessary, e.g. it can
simply flush the entire TLB as a fallback.

	if (WARN_ON_ONCE(!sp->role.direct)) {
		kvm_flush_remote_tlbs(kvm);
		return;
	}

But looking at the series as a whole, I think the better option is to just not
introduce this helper.  There's exactly one user, and if that path encounters an
indirect shadow page then KVM is deep in the weeds.  But that's a moot point,
because unless I'm misreading the flow, there's no need for the unique helper.
kvm_flush_remote_tlbs_sptep() will do the right thing even if the target SP happens
to be indirect.

If we really want to assert that the child is a direct shadow page, then
validate_direct_spte() would be the right location for such an assert.  IMO that's
unnecessary paranoia.  The odds of KVM reaching this point with a completely messed
up shadow paging tree, and without already having hosed the guest and/or yelled
loudly are very, very low.

In other words, IMO we're getting too fancy for this one and we should instead
simply do:

		child = to_shadow_page(*sptep & SPTE_BASE_ADDR_MASK);
		if (child->role.access == direct_access)
			return;

		drop_parent_pte(child, sptep);
		kvm_flush_remote_tlbs_sptep(kvm, sptep);

That has the added benefit of flushing the same "thing" that is zapped, i.e. both
operate on the parent SPTE, not the child.

Note, kvm_flush_remote_tlbs_sptep() operates on the _parent_, where the above
snippets retrieves the child.  This becomes much more obvious once spte_to_child_sp()
comes along: https://lore.kernel.org/all/20220805230513.148869-8-seanjc@google.com.
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e418ef3ecfcb..a3578abd8bbc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -260,6 +260,14 @@  void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
 	kvm_flush_remote_tlbs_with_range(kvm, &range);
 }
 
+/* Flush all memory mapped by the given direct SP. */
+static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	WARN_ON_ONCE(!sp->role.direct);
+	kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+					   KVM_PAGES_PER_HPAGE(sp->role.level + 1));
+}
+
 static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
 			   unsigned int access)
 {
@@ -2341,7 +2349,7 @@  static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			return;
 
 		drop_parent_pte(child, sptep);
-		kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
+		kvm_flush_remote_tlbs_direct_sp(vcpu->kvm, child);
 	}
 }