From patchwork Tue Aug 16 06:46:47 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Guangrong X-Patchwork-Id: 1070732 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p7G6jEvD016744 for ; Tue, 16 Aug 2011 06:45:14 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752582Ab1HPGoy (ORCPT ); Tue, 16 Aug 2011 02:44:54 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:51775 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751282Ab1HPGox (ORCPT ); Tue, 16 Aug 2011 02:44:53 -0400 Received: from tang.cn.fujitsu.com (tang.cn.fujitsu.com [10.167.250.3]) by song.cn.fujitsu.com (Postfix) with ESMTP id 65988170148; Tue, 16 Aug 2011 14:44:47 +0800 (CST) Received: from mailserver.fnst.cn.fujitsu.com (tang.cn.fujitsu.com [127.0.0.1]) by tang.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id p7G6ijJ7001394; Tue, 16 Aug 2011 14:44:45 +0800 Received: from localhost.localdomain ([10.167.225.99]) by mailserver.fnst.cn.fujitsu.com (Lotus Domino Release 8.5.1FP4) with ESMTP id 2011081614433672-20554 ; Tue, 16 Aug 2011 14:43:36 +0800 Message-ID: <4E4A1257.5080204@cn.fujitsu.com> Date: Tue, 16 Aug 2011 14:46:47 +0800 From: Xiao Guangrong User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc15 Thunderbird/3.1.10 MIME-Version: 1.0 To: Avi Kivity CC: Marcelo Tosatti , LKML , KVM Subject: [PATCH 11/11] KVM: MMU: improve write flooding detected References: <4E4A10E8.5090705@cn.fujitsu.com> In-Reply-To: <4E4A10E8.5090705@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.1FP4|July 25, 2010) at 2011-08-16 14:43:36, Serialize by Router on mailserver/fnst(Release 8.5.1FP4|July 25, 2010) at 2011-08-16 14:43:38, Serialize complete at 2011-08-16 14:43:38 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Tue, 16 Aug 2011 06:45:14 +0000 (UTC) Detecting write-flooding does not work well, when we handle page written, if the last speculative spte is not accessed, we treat the page is write-flooding, however, we can speculative spte on many path, such as pte prefetch, page synced, that means the last speculative spte may be not point to the written page and the written page can be accessed via other sptes, so depends on the Accessed bit of the last speculative spte is not enough Instead of detected page accessed, we can detect whether the spte is accessed or not, if the spte is not accessed but it is written frequently, we treat is not a page table or it not used for a long time Signed-off-by: Xiao Guangrong --- arch/x86/include/asm/kvm_host.h | 6 +--- arch/x86/kvm/mmu.c | 48 +++++++++------------------------------ arch/x86/kvm/paging_tmpl.h | 9 +----- 3 files changed, 15 insertions(+), 48 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 927ba73..9d17238 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -239,6 +239,8 @@ struct kvm_mmu_page { int clear_spte_count; #endif + int write_flooding_count; + struct rcu_head rcu; }; @@ -353,10 +355,6 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_page_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; - gfn_t last_pt_write_gfn; - int last_pt_write_count; - u64 *last_pte_updated; - struct fpu guest_fpu; u64 xcr0; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index adaa160..3230f84 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1695,6 +1695,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, } else if (sp->unsync) kvm_mmu_mark_parents_unsync(sp); + sp->write_flooding_count = 0; trace_kvm_mmu_get_page(sp, false); return sp; } @@ -1847,15 +1848,6 @@ static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) mmu_page_remove_parent_pte(sp, parent_pte); } -static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm) -{ - int i; - struct kvm_vcpu *vcpu; - - kvm_for_each_vcpu(i, vcpu, kvm) - vcpu->arch.last_pte_updated = NULL; -} - static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 *parent_pte; @@ -1915,7 +1907,6 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, } sp->role.invalid = 1; - kvm_mmu_reset_last_pte_updated(kvm); return ret; } @@ -2360,8 +2351,6 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } } kvm_release_pfn_clean(pfn); - if (speculative) - vcpu->arch.last_pte_updated = sptep; } static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) @@ -3522,13 +3511,6 @@ static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page, kvm_mmu_flush_tlb(vcpu); } -static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu) -{ - u64 *spte = vcpu->arch.last_pte_updated; - - return !!(spte && (*spte & shadow_accessed_mask)); -} - static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa, const u8 *new, int *bytes) { @@ -3569,22 +3551,14 @@ static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa, * If we're seeing too many writes to a page, it may no longer be a page table, * or we may be forking, in which case it is better to unmap the page. */ -static bool detect_write_flooding(struct kvm_vcpu *vcpu, gfn_t gfn) +static bool detect_write_flooding(struct kvm_mmu_page *sp, u64 *spte) { - bool flooded = false; - - if (gfn == vcpu->arch.last_pt_write_gfn - && !last_updated_pte_accessed(vcpu)) { - ++vcpu->arch.last_pt_write_count; - if (vcpu->arch.last_pt_write_count >= 3) - flooded = true; - } else { - vcpu->arch.last_pt_write_gfn = gfn; - vcpu->arch.last_pt_write_count = 1; - vcpu->arch.last_pte_updated = NULL; - } + if (spte && !(*spte & shadow_accessed_mask)) + sp->write_flooding_count++; + else + sp->write_flooding_count = 0; - return flooded; + return sp->write_flooding_count >= 3; } /* @@ -3656,7 +3630,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, LIST_HEAD(invalid_list); u64 entry, gentry, *spte; int npte; - bool remote_flush, local_flush, zap_page, flooded, misaligned; + bool remote_flush, local_flush, zap_page; /* * If we don't have indirect shadow pages, it means no page is @@ -3675,12 +3649,12 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, ++vcpu->kvm->stat.mmu_pte_write; trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE); - flooded = detect_write_flooding(vcpu, gfn); mask.cr0_wp = mask.cr4_pae = mask.nxe = 1; for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn, node) { - misaligned = detect_write_misaligned(sp, gpa, bytes); + spte = get_written_sptes(sp, gpa, &npte); - if (misaligned || flooded) { + if (detect_write_misaligned(sp, gpa, bytes) || + detect_write_flooding(sp, spte)) { zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); ++vcpu->kvm->stat.mmu_flooded; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index bdc2241..ec5c1b4 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -599,11 +599,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, */ if (!r) { pgprintk("%s: guest page fault\n", __func__); - if (!prefault) { + if (!prefault) inject_page_fault(vcpu, &walker.fault); - /* reset fork detector */ - vcpu->arch.last_pt_write_count = 0; - } + return 0; } @@ -641,9 +639,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, pgprintk("%s: shadow pte %p %llx emulate %d\n", __func__, sptep, *sptep, emulate); - if (!emulate) - vcpu->arch.last_pt_write_count = 0; /* reset fork detector */ - ++vcpu->stat.pf_fixed; trace_kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT); spin_unlock(&vcpu->kvm->mmu_lock);