From patchwork Fri Aug 28 02:31:04 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Max Laier X-Patchwork-Id: 44371 Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n7S2VCZX010053 for ; Fri, 28 Aug 2009 02:31:12 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751423AbZH1CbI (ORCPT ); Thu, 27 Aug 2009 22:31:08 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751422AbZH1CbH (ORCPT ); Thu, 27 Aug 2009 22:31:07 -0400 Received: from moutng.kundenserver.de ([212.227.17.10]:57219 "EHLO moutng.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751410AbZH1CbF (ORCPT ); Thu, 27 Aug 2009 22:31:05 -0400 Received: from vampire.homelinux.org (dslb-088-066-007-211.pools.arcor-ip.net [88.66.7.211]) by mrelayeu.kundenserver.de (node=mrbap0) with ESMTP (Nemesis) id 0MKsym-1MgrF33H2C-000lxW; Fri, 28 Aug 2009 04:31:06 +0200 Received: (qmail 20349 invoked from network); 28 Aug 2009 02:31:05 -0000 Received: from kvm.laiers.local (HELO kvm.localnet) (192.168.4.200) by laiers.local with SMTP; 28 Aug 2009 02:31:05 -0000 From: Max Laier Organization: FreeBSD To: kvm@vger.kernel.org Subject: RFC: shadow page table reclaim Date: Fri, 28 Aug 2009 04:31:04 +0200 User-Agent: KMail/1.12.0 (Linux/2.6.30-ARCH; KDE/4.3.0; x86_64; ; ) MIME-Version: 1.0 Message-Id: <200908280431.04960.max@laiers.net> X-Provags-ID: V01U2FsdGVkX19bpRRxez/19sc+2vs74GfQo9ZRpBs+KYUNtnh PqFqNOO46rHfuiYFUj1hA6DQGgmeyCMJUHId3cfmhoE2uzaiKL F8FWIwqcuG5QvThtAEF5Q== Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Hello, it seems to me that the reclaim mechanism for shadow page table pages is sub- optimal. The arch.active_mmu_pages list that is used for reclaiming does not move up parent shadow page tables when a child is added so when we need a new shadow page we zap the oldest - which can well be a directory level page holding a just added table level page. Attached is a proof-of-concept diff and two plots before and after. The plots show referenced guest pages over time. As you can see there is less saw- toothing in the after plot and also fewer changes overall (because we don't zap mappings that are still in use as often). This is with a limit of 64 for the shadow page table to increase the effect and vmx/ept. I realize that the list_move and parent walk are quite expensive and that kvm_mmu_alloc_page is only half the story. It should really be done every time a new guest page table is mapped - maybe via rmap_add. This would obviously completely kill performance-wise, though. Another idea would be to improve the reclaim logic in a way that it prefers "old" PT_PAGE_TABLE_LEVEL over directories. Though I'm not sure how to code that up sensibly, either. As I said, this is proof-of-concept and RFC. So any comments welcome. For my use case the proof-of-concept diff seems to do well enough, though. Thanks, diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 95d5329..0a63570 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -190,6 +190,8 @@ struct kvm_unsync_walk { }; typedef int (*mmu_parent_walk_fn) (struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp); +static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, + mmu_parent_walk_fn fn); static struct kmem_cache *pte_chain_cache; static struct kmem_cache *rmap_desc_cache; @@ -900,6 +902,12 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn) return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1); } +static int move_up_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) +{ + list_move(&sp->link, &vcpu->kvm->arch.active_mmu_pages); + return 1; +} + static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, u64 *parent_pte) { @@ -918,6 +926,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS); sp->multimapped = 0; sp->parent_pte = parent_pte; +#if 1 + if (parent_pte) + mmu_parent_walk(vcpu, sp, move_up_walk_fn); +#endif --vcpu->kvm->arch.n_free_mmu_pages; return sp; }