From patchwork Wed Aug 5 08:17:12 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Avi Kivity X-Patchwork-Id: 39298 Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by demeter.kernel.org (8.14.2/8.14.2) with ESMTP id n758D0sg017351 for ; Wed, 5 Aug 2009 08:13:00 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933774AbZHEIMT (ORCPT ); Wed, 5 Aug 2009 04:12:19 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S933768AbZHEIMT (ORCPT ); Wed, 5 Aug 2009 04:12:19 -0400 Received: from mx2.redhat.com ([66.187.237.31]:37054 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933767AbZHEIMQ (ORCPT ); Wed, 5 Aug 2009 04:12:16 -0400 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n758BljL016152; Wed, 5 Aug 2009 04:11:47 -0400 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n758BjW2017058; Wed, 5 Aug 2009 04:11:46 -0400 Received: from cleopatra.tlv.redhat.com (cleopatra.tlv.redhat.com [10.35.255.11]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n758BhZu003784; Wed, 5 Aug 2009 04:11:44 -0400 Received: from balrog.qumranet.com (dhcp-1-197.tlv.redhat.com [10.35.1.197]) by cleopatra.tlv.redhat.com (Postfix) with ESMTP id 826ED250061; Wed, 5 Aug 2009 11:11:43 +0300 (IDT) Message-ID: <4A794008.6030204@redhat.com> Date: Wed, 05 Aug 2009 11:17:12 +0300 From: Avi Kivity User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Lightning/1.0pre Thunderbird/3.0b2 MIME-Version: 1.0 To: Wu Fengguang CC: Rik van Riel , "Dike, Jeffrey G" , "Yu, Wilfred" , "Kleen, Andi" , Andrea Arcangeli , Hugh Dickins , Andrew Morton , Christoph Lameter , KOSAKI Motohiro , Mel Gorman , LKML , linux-mm , Andrea Arcangeli , KVM list Subject: Re: [RFC] respect the referenced bit of KVM guest pages? References: <20090805024058.GA8886@localhost> <4A793B92.9040204@redhat.com> In-Reply-To: <4A793B92.9040204@redhat.com> X-Scanned-By: MIMEDefang 2.58 on 172.16.27.26 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On 08/05/2009 10:58 AM, Avi Kivity wrote: > On 08/05/2009 05:40 AM, Wu Fengguang wrote: >> Greetings, >> >> Jeff Dike found that many KVM pages are being refaulted in 2.6.29: >> >> "Lots of pages between discarded due to memory pressure only to be >> faulted back in soon after. These pages are nearly all stack pages. >> This is not consistent - sometimes there are relatively few such pages >> and they are spread out between processes." >> >> The refaults can be drastically reduced by the following patch, which >> respects the referenced bit of all anonymous pages (including the KVM >> pages). >> >> However it risks reintroducing the problem addressed by commit 7e9cd4842 >> (fix reclaim scalability problem by ignoring the referenced bit, >> mainly the pte young bit). I wonder if there are better solutions? > > How do you distinguish between kvm pages and non-kvm anonymous pages? > More importantly, why should you? > > Jeff, do you see the refaults on Nehalem systems? If so, that's > likely due to the lack of an accessed bit on EPT pagetables. It would > be interesting to compare with Barcelona (which does). > > If that's indeed the case, we can have the EPT ageing mechanism give > pages a bit more time around by using an available bit in the EPT PTEs > to return accessed on the first pass and not-accessed on the second. > The attached patch implements this. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7b53614..310938a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -195,6 +195,7 @@ static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ static u64 __read_mostly shadow_user_mask; static u64 __read_mostly shadow_accessed_mask; static u64 __read_mostly shadow_dirty_mask; +static int __read_mostly shadow_accessed_shift; static inline u64 rsvd_bits(int s, int e) { @@ -219,6 +220,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, { shadow_user_mask = user_mask; shadow_accessed_mask = accessed_mask; + shadow_accessed_shift + = find_first_bit((void *)&shadow_accessed_mask, 64); shadow_dirty_mask = dirty_mask; shadow_nx_mask = nx_mask; shadow_x_mask = x_mask; @@ -817,11 +820,11 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp) while (spte) { int _young; u64 _spte = *spte; - BUG_ON(!(_spte & PT_PRESENT_MASK)); - _young = _spte & PT_ACCESSED_MASK; + BUG_ON(!(_spte & shadow_accessed_mask)); + _young = _spte & shadow_accessed_mask; if (_young) { young = 1; - clear_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte); + clear_bit(shadow_accessed_shift, (unsigned long *)spte); } spte = rmap_next(kvm, rmapp, spte); } @@ -2572,7 +2575,7 @@ static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn) && shadow_accessed_mask && !(*spte & shadow_accessed_mask) && is_shadow_present_pte(*spte)) - set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte); + set_bit(shadow_accessed_shift, (unsigned long *)spte); } void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0ba706e..bc99367 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4029,7 +4029,7 @@ static int __init vmx_init(void) bypass_guest_pf = 0; kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | VMX_EPT_WRITABLE_MASK); - kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull, + kvm_mmu_set_mask_ptes(0ull, 1ull << 63, 0ull, 0ull, VMX_EPT_EXECUTABLE_MASK); kvm_enable_tdp(); } else