Message ID | 20190202013825.51261-4-Tianyu.Lan@microsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | X86/KVM/Hyper-V: Add HV ept tlb range list flush support in KVM | expand |
On 02/02/19 02:38, lantianyu1986@gmail.com wrote: > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index ce770b446238..70cafd3f95ab 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, > > if (level > PT_PAGE_TABLE_LEVEL) > spte |= PT_PAGE_SIZE_MASK; > + > + sp->last_level = is_last_spte(spte, level); sp->last_level is always true here. Paolo > if (tdp_enabled) > spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, > kvm_is_mmio_pfn(pfn));
On 02/02/19 02:38, lantianyu1986@gmail.com wrote: > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index ce770b446238..70cafd3f95ab 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, > > if (level > PT_PAGE_TABLE_LEVEL) > spte |= PT_PAGE_SIZE_MASK; > + > + sp->last_level = is_last_spte(spte, level); Wait, I wasn't thinking straight. If a struct kvm_mmu_page exists, it is never the last level. Page table entries for the last level do not have a struct kvm_mmu_page. Therefore you don't need the flag after all. I suspect your calculations in patch 2 are off by one, and you actually need hlist_for_each_entry(sp, range->flush_list, flush_link) { int pages = KVM_PAGES_PER_HPAGE(sp->role.level + 1); ... } For example, if sp->role.level is 1 then the struct kvm_mmu_page is for a page containing PTEs and covers an area of 2 MiB. Thanks, Paolo > if (tdp_enabled) > spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, > kvm_is_mmio_pfn(pfn));
On Fri, Feb 15, 2019 at 12:32 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 02/02/19 02:38, lantianyu1986@gmail.com wrote: > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > > index ce770b446238..70cafd3f95ab 100644 > > --- a/arch/x86/kvm/mmu.c > > +++ b/arch/x86/kvm/mmu.c > > @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, > > > > if (level > PT_PAGE_TABLE_LEVEL) > > spte |= PT_PAGE_SIZE_MASK; > > + > > + sp->last_level = is_last_spte(spte, level); > > Wait, I wasn't thinking straight. If a struct kvm_mmu_page exists, it > is never the last level. Page table entries for the last level do not > have a struct kvm_mmu_page. > > Therefore you don't need the flag after all. I suspect your > calculations in patch 2 are off by one, and you actually need > > hlist_for_each_entry(sp, range->flush_list, flush_link) { > int pages = KVM_PAGES_PER_HPAGE(sp->role.level + 1); > ... > } > > For example, if sp->role.level is 1 then the struct kvm_mmu_page is for > a page containing PTEs and covers an area of 2 MiB. Yes, you are right. Thanks to point out and will fix. The last_level flag is to avoid adding middle page node(e.g, PGD, PMD) into flush list. The address range will be duplicated if adding both leaf, node and middle node into flush list. > > Thanks, > > Paolo > > > if (tdp_enabled) > > spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, > > kvm_is_mmio_pfn(pfn)); >
On 15/02/19 16:05, Tianyu Lan wrote: > Yes, you are right. Thanks to point out and will fix. The last_level > flag is to avoid adding middle page node(e.g, PGD, PMD) > into flush list. The address range will be duplicated if adding both > leaf, node and middle node into flush list. Hmm, that's not easy to track. One kvm_mmu_page could include both leaf and non-leaf page (for example a huge page for 0 to 2 MB and a page table for 2 MB to 4 MB). Is this really needed? First, your benchmarks so far have been done with sp->last_level always set to true. Second, you would only encounter this optimization in kvm_mmu_commit_zap_page when zapping a 1 GB region (which then would be invalidated twice, at both the PMD and PGD level) or bigger. Paolo
On Fri, Feb 15, 2019 at 11:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 15/02/19 16:05, Tianyu Lan wrote: > > Yes, you are right. Thanks to point out and will fix. The last_level > > flag is to avoid adding middle page node(e.g, PGD, PMD) > > into flush list. The address range will be duplicated if adding both > > leaf, node and middle node into flush list. > > Hmm, that's not easy to track. One kvm_mmu_page could include both leaf > and non-leaf page (for example a huge page for 0 to 2 MB and a page > table for 2 MB to 4 MB). > > Is this really needed? First, your benchmarks so far have been done > with sp->last_level always set to true. Second, you would only > encounter this optimization in kvm_mmu_commit_zap_page when zapping a 1 > GB region (which then would be invalidated twice, at both the PMD and > PGD level) or bigger. > > Paolo Hi Paolo: Sorry for later response and I tried to figure out a bug lead by defining wrong max flush count. I just sent out V3. I still put the last_level flag patch in the end of patchset. Detail please see the change log. Just like you said this was an optimization and wasn't 100% required. If you still have some concerns, you can ignore it and other patches in this patchset should be good. Thanks.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4a3d3e58fe0a..9d858d68c17a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -325,6 +325,7 @@ struct kvm_mmu_page { struct hlist_node flush_link; struct hlist_node hash_link; bool unsync; + bool last_level; /* * The following two entries are used to key the shadow page in the diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ce770b446238..70cafd3f95ab 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (level > PT_PAGE_TABLE_LEVEL) spte |= PT_PAGE_SIZE_MASK; + + sp->last_level = is_last_spte(spte, level); + if (tdp_enabled) spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, kvm_is_mmio_pfn(pfn));