From patchwork Fri Aug 28 02:31:04 2009
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Max Laier <max@laiers.net>
X-Patchwork-Id: 44371
Received: from vger.kernel.org (vger.kernel.org [209.132.176.167])
	by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n7S2VCZX010053
	for <patchwork-kvm@patchwork.kernel.org>;
	Fri, 28 Aug 2009 02:31:12 GMT
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751423AbZH1CbI (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Thu, 27 Aug 2009 22:31:08 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751422AbZH1CbH
	(ORCPT <rfc822;kvm-outgoing>); Thu, 27 Aug 2009 22:31:07 -0400
Received: from moutng.kundenserver.de ([212.227.17.10]:57219 "EHLO
	moutng.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751410AbZH1CbF (ORCPT <rfc822; kvm@vger.kernel.org>);
	Thu, 27 Aug 2009 22:31:05 -0400
Received: from vampire.homelinux.org
	(dslb-088-066-007-211.pools.arcor-ip.net [88.66.7.211])
	by mrelayeu.kundenserver.de (node=mrbap0) with ESMTP (Nemesis)
	id 0MKsym-1MgrF33H2C-000lxW; Fri, 28 Aug 2009 04:31:06 +0200
Received: (qmail 20349 invoked from network); 28 Aug 2009 02:31:05 -0000
Received: from kvm.laiers.local (HELO kvm.localnet) (192.168.4.200)
	by laiers.local with SMTP; 28 Aug 2009 02:31:05 -0000
From: Max Laier <max@laiers.net>
Organization: FreeBSD
To: kvm@vger.kernel.org
Subject: RFC: shadow page table reclaim
Date: Fri, 28 Aug 2009 04:31:04 +0200
User-Agent: KMail/1.12.0 (Linux/2.6.30-ARCH; KDE/4.3.0; x86_64; ; )
MIME-Version: 1.0
Message-Id: <200908280431.04960.max@laiers.net>
X-Provags-ID: V01U2FsdGVkX19bpRRxez/19sc+2vs74GfQo9ZRpBs+KYUNtnh
	PqFqNOO46rHfuiYFUj1hA6DQGgmeyCMJUHId3cfmhoE2uzaiKL
	F8FWIwqcuG5QvThtAEF5Q==
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

Hello,

it seems to me that the reclaim mechanism for shadow page table pages is sub-
optimal.  The arch.active_mmu_pages list that is used for reclaiming does not 
move up parent shadow page tables when a child is added so when we need a new 
shadow page we zap the oldest - which can well be a directory level page 
holding a just added table level page.

Attached is a proof-of-concept diff and two plots before and after.  The plots 
show referenced guest pages over time.  As you can see there is less saw-
toothing in the after plot and also fewer changes overall (because we don't 
zap mappings that are still in use as often).  This is with a limit of 64 for 
the shadow page table to increase the effect and vmx/ept.

I realize that the list_move and parent walk are quite expensive and that 
kvm_mmu_alloc_page is only half the story.  It should really be done every 
time a new guest page table is mapped - maybe via rmap_add.  This would 
obviously completely kill performance-wise, though.

Another idea would be to improve the reclaim logic in a way that it prefers 
"old" PT_PAGE_TABLE_LEVEL over directories.  Though I'm not sure how to code 
that up sensibly, either.

As I said, this is proof-of-concept and RFC.  So any comments welcome.  For my 
use case the proof-of-concept diff seems to do well enough, though.

Thanks,

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 95d5329..0a63570 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -190,6 +190,8 @@ struct kvm_unsync_walk {
 };
 
 typedef int (*mmu_parent_walk_fn) (struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp);
+static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+			    mmu_parent_walk_fn fn);
 
 static struct kmem_cache *pte_chain_cache;
 static struct kmem_cache *rmap_desc_cache;
@@ -900,6 +902,12 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn)
 	return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
 }
 
+static int move_up_walk_fn(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+	list_move(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
+	return 1;
+}
+
 static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 					       u64 *parent_pte)
 {
@@ -918,6 +926,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
 	bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
 	sp->multimapped = 0;
 	sp->parent_pte = parent_pte;
+#if 1
+	if (parent_pte)
+		mmu_parent_walk(vcpu, sp, move_up_walk_fn);
+#endif
 	--vcpu->kvm->arch.n_free_mmu_pages;
 	return sp;
 }