Message ID | 20200623194027.23135-3-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/mmu: Optimizations for kvm_get_mmu_page() | expand |
LGTM. Reviewed-By: Jon Cargille <jcargill@google.com> On Tue, Jun 23, 2020 at 12:40 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > Skip the unsync checks and the write flooding clearing for fully direct > MMUs, which are guaranteed to not have unsync'd or indirect pages (write > flooding detection only applies to indirect pages). For TDP, this > avoids unnecessary memory reads and writes, and for the write flooding > count will also avoid dirtying a cache line (unsync_child_bitmap itself > consumes a cache line, i.e. write_flooding_count is guaranteed to be in > a different cache line than parent_ptes). > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > arch/x86/kvm/mmu/mmu.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 67f8f82e9783..c568a5c55276 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2475,6 +2475,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > int direct, > unsigned int access) > { > + bool direct_mmu = vcpu->arch.mmu->direct_map; > union kvm_mmu_page_role role; > struct hlist_head *sp_list; > unsigned quadrant; > @@ -2490,8 +2491,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > if (role.direct) > role.gpte_is_8_bytes = true; > role.access = access; > - if (!vcpu->arch.mmu->direct_map > - && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { > + if (!direct_mmu && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { > quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level)); > quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; > role.quadrant = quadrant; > @@ -2510,6 +2510,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > if (sp->role.word != role.word) > continue; > > + if (direct_mmu) > + goto trace_get_page; > + > if (sp->unsync) { > /* The page is good, but __kvm_sync_page might still end > * up zapping it. If so, break in order to rebuild it. > @@ -2525,6 +2528,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); > > __clear_sp_write_flooding_count(sp); > + > +trace_get_page: > trace_kvm_mmu_get_page(sp, false); > goto out; > } > -- > 2.26.0 >
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 67f8f82e9783..c568a5c55276 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2475,6 +2475,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, int direct, unsigned int access) { + bool direct_mmu = vcpu->arch.mmu->direct_map; union kvm_mmu_page_role role; struct hlist_head *sp_list; unsigned quadrant; @@ -2490,8 +2491,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (role.direct) role.gpte_is_8_bytes = true; role.access = access; - if (!vcpu->arch.mmu->direct_map - && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { + if (!direct_mmu && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level)); quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; role.quadrant = quadrant; @@ -2510,6 +2510,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->role.word != role.word) continue; + if (direct_mmu) + goto trace_get_page; + if (sp->unsync) { /* The page is good, but __kvm_sync_page might still end * up zapping it. If so, break in order to rebuild it. @@ -2525,6 +2528,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); __clear_sp_write_flooding_count(sp); + +trace_get_page: trace_kvm_mmu_get_page(sp, false); goto out; }
Skip the unsync checks and the write flooding clearing for fully direct MMUs, which are guaranteed to not have unsync'd or indirect pages (write flooding detection only applies to indirect pages). For TDP, this avoids unnecessary memory reads and writes, and for the write flooding count will also avoid dirtying a cache line (unsync_child_bitmap itself consumes a cache line, i.e. write_flooding_count is guaranteed to be in a different cache line than parent_ptes). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kvm/mmu/mmu.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)