Message ID | 20210205044403.1559010-1-jingzhangos@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Remove redundant check for S2FWB | expand |
Hi Jing, On 2021-02-05 04:44, Jing Zhang wrote: > Remove redundant check for CPU feature S2FWB in dcache flush code > to save some CPU cycles for every memslot flush and unmapping. What CPU cycles? This is only a static branch. Can you actually measure the overhead? What does it represent in the face of a full memslot unmapping? Thanks, M.
Hi Marc, Thanks for the comment. On Fri, Feb 5, 2021 at 11:24 AM Marc Zyngier <maz@kernel.org> wrote: > > Hi Jing, > > On 2021-02-05 04:44, Jing Zhang wrote: > > Remove redundant check for CPU feature S2FWB in dcache flush code > > to save some CPU cycles for every memslot flush and unmapping. > > What CPU cycles? This is only a static branch. Can you actually > measure the overhead? What does it represent in the face of > a full memslot unmapping? For CPU cycles, I mean CPU time spent for S2FWB check. For memslot unmapping, there is actually no improvement, just move the check to the stage2_unmap_walker since we removed the S2FWB check in stage2_flush_dcache. The saving is from the code path of memslot flush. The S2FWB check was in stage2_flush_dcache, in which case, for a memslot flush, the check was done for every page. Now it will save some CPU time if we do the check at a higher level, like in kvm_toggle_cache, access_dcsw, kvm_arch_prepare_memory_region. The redundant check is as follows (Only the first check is necessary): kvm_arch_prepare_memory_region -> S2FWB check -> stage2_flush_memslot -> kvm_pgtable_stage2_flush -> S2FWB check -> stage2_flush_walker -> S2FWB check -> __flush_dcache_area > > Thanks, > > M. > -- > Jazz is not dead. It just smells funny...
On Fri, Feb 05, 2021 at 04:44:03AM +0000, Jing Zhang wrote: > Remove redundant check for CPU feature S2FWB in dcache flush code > to save some CPU cycles for every memslot flush and unmapping. > And move the S2FWB check to outer functions to avoid future > redundancy and keep consistent with other usage like in > access_dcsw and kvm_arch_prepare_memory_region. > > Signed-off-by: Jing Zhang <jingzhangos@google.com> > --- > arch/arm64/kvm/hyp/pgtable.c | 9 ++------- > arch/arm64/kvm/mmu.c | 3 ++- > 2 files changed, 4 insertions(+), 8 deletions(-) > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index bdf8e55ed308..afd57564b1cb 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -642,9 +642,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, > > static void stage2_flush_dcache(void *addr, u64 size) > { > - if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) > - return; > - > __flush_dcache_area(addr, size); > } > > @@ -670,7 +667,8 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, > > if (page_count(virt_to_page(childp)) != 1) > return 0; > - } else if (stage2_pte_cacheable(pte)) { > + } else if (stage2_pte_cacheable(pte) && > + !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) { > need_flush = true; > } > > @@ -846,9 +844,6 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size) > .flags = KVM_PGTABLE_WALK_LEAF, > }; > > - if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) > - return 0; > - > return kvm_pgtable_walk(pgt, addr, size, &walker); > } I think we should leave pgtable.c as it is: there's no benefit from this change on the unmap path, and the other path involves the case where the caller has asked for a flush and we can elide it. > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 7d2257cc5438..53130ed23304 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1458,7 +1458,8 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled) > * If switching it off, need to clean the caches. > * Clean + invalidate does the trick always. > */ > - if (now_enabled != was_enabled) > + if (now_enabled != was_enabled && > + !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) > stage2_flush_vm(vcpu->kvm); This change looks fine, but I don't grok the justification in your follow-up email. You say: | The saving is from the code path of memslot flush. The S2FWB check was | in stage2_flush_dcache, in which case, for a memslot flush, the check | was done for every page. but I don't see where this is called for every page. It looks to me like it's called for every pgd in the range, which is a very different kettle of frogs. What am I missing? Will
Hi Will, On Mon, Mar 8, 2021 at 10:49 AM Will Deacon <will@kernel.org> wrote: > > On Fri, Feb 05, 2021 at 04:44:03AM +0000, Jing Zhang wrote: > > Remove redundant check for CPU feature S2FWB in dcache flush code > > to save some CPU cycles for every memslot flush and unmapping. > > And move the S2FWB check to outer functions to avoid future > > redundancy and keep consistent with other usage like in > > access_dcsw and kvm_arch_prepare_memory_region. > > > > Signed-off-by: Jing Zhang <jingzhangos@google.com> > > --- > > arch/arm64/kvm/hyp/pgtable.c | 9 ++------- > > arch/arm64/kvm/mmu.c | 3 ++- > > 2 files changed, 4 insertions(+), 8 deletions(-) > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > index bdf8e55ed308..afd57564b1cb 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -642,9 +642,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, > > > > static void stage2_flush_dcache(void *addr, u64 size) > > { > > - if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) > > - return; > > - > > __flush_dcache_area(addr, size); > > } > > > > @@ -670,7 +667,8 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, > > > > if (page_count(virt_to_page(childp)) != 1) > > return 0; > > - } else if (stage2_pte_cacheable(pte)) { > > + } else if (stage2_pte_cacheable(pte) && > > + !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) { > > need_flush = true; > > } > > > > @@ -846,9 +844,6 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size) > > .flags = KVM_PGTABLE_WALK_LEAF, > > }; > > > > - if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) > > - return 0; > > - > > return kvm_pgtable_walk(pgt, addr, size, &walker); > > } > > I think we should leave pgtable.c as it is: there's no benefit from this > change on the unmap path, and the other path involves the case where the > caller has asked for a flush and we can elide it. Agreed. > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 7d2257cc5438..53130ed23304 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1458,7 +1458,8 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled) > > * If switching it off, need to clean the caches. > > * Clean + invalidate does the trick always. > > */ > > - if (now_enabled != was_enabled) > > + if (now_enabled != was_enabled && > > + !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) > > stage2_flush_vm(vcpu->kvm); > > This change looks fine, but I don't grok the justification in your follow-up > email. You say: > > | The saving is from the code path of memslot flush. The S2FWB check was > | in stage2_flush_dcache, in which case, for a memslot flush, the check > | was done for every page. > > but I don't see where this is called for every page. It looks to me like it's > called for every pgd in the range, which is a very different kettle of frogs. > > What am I missing? You are right. It is called for every pgd in the range instead of for every page. > > Will Thanks, Jing
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index bdf8e55ed308..afd57564b1cb 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -642,9 +642,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, static void stage2_flush_dcache(void *addr, u64 size) { - if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) - return; - __flush_dcache_area(addr, size); } @@ -670,7 +667,8 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, if (page_count(virt_to_page(childp)) != 1) return 0; - } else if (stage2_pte_cacheable(pte)) { + } else if (stage2_pte_cacheable(pte) && + !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) { need_flush = true; } @@ -846,9 +844,6 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size) .flags = KVM_PGTABLE_WALK_LEAF, }; - if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) - return 0; - return kvm_pgtable_walk(pgt, addr, size, &walker); } diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 7d2257cc5438..53130ed23304 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1458,7 +1458,8 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled) * If switching it off, need to clean the caches. * Clean + invalidate does the trick always. */ - if (now_enabled != was_enabled) + if (now_enabled != was_enabled && + !cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) stage2_flush_vm(vcpu->kvm); /* Caches are now on, stop trapping VM ops (until a S/W op) */
Remove redundant check for CPU feature S2FWB in dcache flush code to save some CPU cycles for every memslot flush and unmapping. And move the S2FWB check to outer functions to avoid future redundancy and keep consistent with other usage like in access_dcsw and kvm_arch_prepare_memory_region. Signed-off-by: Jing Zhang <jingzhangos@google.com> --- arch/arm64/kvm/hyp/pgtable.c | 9 ++------- arch/arm64/kvm/mmu.c | 3 ++- 2 files changed, 4 insertions(+), 8 deletions(-) base-commit: a8ac864a7d6dbc2fc43081b1eecd9e0183065d47