From patchwork Wed May 29 16:03:44 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
X-Patchwork-Id: 2631071
Return-Path: <kvm-owner@vger.kernel.org>
X-Original-To: patchwork-kvm@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork1.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork1.kernel.org (Postfix) with ESMTP id 1F28140077
	for <patchwork-kvm@patchwork.kernel.org>;
	Wed, 29 May 2013 16:04:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751356Ab3E2QD4 (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Wed, 29 May 2013 12:03:56 -0400
Received: from e28smtp07.in.ibm.com ([122.248.162.7]:53569 "EHLO
	e28smtp07.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S934425Ab3E2QDy (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 29 May 2013 12:03:54 -0400
Received: from /spool/local
	by e28smtp07.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <kvm@vger.kernel.org> from <xiaoguangrong@linux.vnet.ibm.com>;
	Wed, 29 May 2013 21:27:41 +0530
Received: from d28dlp02.in.ibm.com (9.184.220.127)
	by e28smtp07.in.ibm.com (192.168.1.137) with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted;
	Wed, 29 May 2013 21:27:40 +0530
Received: from d28relay04.in.ibm.com (d28relay04.in.ibm.com [9.184.220.61])
	by d28dlp02.in.ibm.com (Postfix) with ESMTP id 07408394004D;
	Wed, 29 May 2013 21:33:51 +0530 (IST)
Received: from d28av01.in.ibm.com (d28av01.in.ibm.com [9.184.220.63])
	by d28relay04.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r4TG3hbH9634190; Wed, 29 May 2013 21:33:43 +0530
Received: from d28av01.in.ibm.com (loopback [127.0.0.1])
	by d28av01.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id
	r4TG3mN6013273; Wed, 29 May 2013 16:03:48 GMT
Received: from localhost.localdomain ([9.125.28.118])
	by d28av01.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id
	r4TG3i2G013064; Wed, 29 May 2013 16:03:45 GMT
Message-ID: <51A626E0.9030308@linux.vnet.ibm.com>
Date: Thu, 30 May 2013 00:03:44 +0800
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:17.0) Gecko/20130110 Thunderbird/17.0.2
MIME-Version: 1.0
To: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>, gleb@redhat.com,
	avi.kivity@gmail.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v7 04/11] KVM: MMU: zap pages in batch
References: 
 <1369252560-11611-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com>
	<1369252560-11611-5-git-send-email-xiaoguangrong@linux.vnet.ibm.com>
	<20130524203432.GB4525@amt.cnet>
	<51A2C2DC.6080403@linux.vnet.ibm.com>
	<20130528001802.GB1359@amt.cnet>
	<51A4C6F1.9000607@linux.vnet.ibm.com>
	<20130529111132.GA5931@amt.cnet>
	<51A5FDF5.8020003@linux.vnet.ibm.com>
	<20130529133243.GG5931@amt.cnet>
	<51A60A64.2080509@linux.vnet.ibm.com>
In-Reply-To: <51A60A64.2080509@linux.vnet.ibm.com>
X-TM-AS-MML: No
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 13052915-8878-0000-0000-0000074E5DE6
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

On 05/29/2013 10:02 PM, Xiao Guangrong wrote:
> On 05/29/2013 09:32 PM, Marcelo Tosatti wrote:
>> On Wed, May 29, 2013 at 09:09:09PM +0800, Xiao Guangrong wrote:
>>> This information is I replied Gleb in his mail where he raced a question that
>>> why "collapse tlb flush is needed":
>>>
>>> ======
>>> It seems no.
>>> Since we have reloaded mmu before zapping the obsolete pages, the mmu-lock
>>> is easily contended. I did the simple track:
>>>
>>> +       int num = 0;
>>>  restart:
>>>         list_for_each_entry_safe_reverse(sp, node,
>>>               &kvm->arch.active_mmu_pages, link) {
>>> @@ -4265,6 +4265,7 @@ restart:
>>>                 if (batch >= BATCH_ZAP_PAGES &&
>>>                       cond_resched_lock(&kvm->mmu_lock)) {
>>>                         batch = 0;
>>> +                       num++;
>>>                         goto restart;
>>>                 }
>>>
>>> @@ -4277,6 +4278,7 @@ restart:
>>>          * may use the pages.
>>>          */
>>>         kvm_mmu_commit_zap_page(kvm, &invalid_list);
>>> +       printk("lock-break: %d.\n", num);
>>>  }
>>>
>>> I do read pci rom when doing kernel building in the guest which
>>> has 1G memory and 4vcpus with ept enabled, this is the normal
>>> workload and normal configuration.
>>>
>>> # dmesg
>>> [ 2338.759099] lock-break: 8.
>>> [ 2339.732442] lock-break: 5.
>>> [ 2340.904446] lock-break: 3.
>>> [ 2342.513514] lock-break: 3.
>>> [ 2343.452229] lock-break: 3.
>>> [ 2344.981599] lock-break: 4.
>>>
>>> Basically, we need to break many times.
>>
>> Should measure kvm_mmu_zap_all latency.
>>
>>> ======
>>>
>>> You can see we should break 3 times to zap all pages even if we have zapoed
>>> 10 pages in batch. It is obviously that it need break more times without
>>> batch-zapping.
>>
>> Again, breaking should be no problem, what matters is latency. Please
>> measure kvm_mmu_zap_all latency after all optimizations to justify 
>> this minimum batching.
> 
> Okay, okay. I will benchmark the latency.

Okay, I have done the test, the test environment is the same that
"I do read pci rom when doing kernel building in the guest which
has 1G memory and 4vcpus with ept enabled, this is the normal
workload and normal configuration.".

Batch-zapped:
Guest:
# cat     /sys/bus/pci/devices/0000\:00\:03.0/rom
# free -m
             total       used       free     shared    buffers     cached
Mem:           975        793        181          0          6        438
-/+ buffers/cache:        347        627
Swap:         2015         43       1972

Host shows:
[ 2229.918558] lock-break: 5.
[ 2229.918564] kvm_mmu_invalidate_zap_all_pages: 174706e.


No-batch:
Guest:
# cat     /sys/bus/pci/devices/0000\:00\:03.0/rom
# free -m
             total       used       free     shared    buffers     cached
Mem:           975        843        131          0         17        476
-/+ buffers/cache:        348        626
Swap:         2015          2

Host shows:
[ 2931.675285] lock-break: 13.
[ 2931.675291] kvm_mmu_invalidate_zap_all_pages: 69c1676.

That means, nearly the same memory accessed on guest:
- batch-zapped need to break 5 times, the latency is 174706e.
- no-batch need to break 13 times, the latency is 69c1676.

The code change to track the latency:
---
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 055d675..a66f21b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4233,13 +4233,13 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
        spin_unlock(&kvm->mmu_lock);
 }

-#define BATCH_ZAP_PAGES        10
+#define BATCH_ZAP_PAGES        0
 static void kvm_zap_obsolete_pages(struct kvm *kvm)
 {
        struct kvm_mmu_page *sp, *node;
        LIST_HEAD(invalid_list);
        int batch = 0;
-
+       int num = 0;
 restart:
        list_for_each_entry_safe_reverse(sp, node,
              &kvm->arch.active_mmu_pages, link) {
@@ -4265,6 +4265,7 @@ restart:
                if (batch >= BATCH_ZAP_PAGES &&
                      cond_resched_lock(&kvm->mmu_lock)) {
                        batch = 0;
+                       num++;
                        goto restart;
                }

@@ -4277,6 +4278,7 @@ restart:
         * may use the pages.
         */
        kvm_mmu_commit_zap_page(kvm, &invalid_list);
+       printk("lock-break: %d.\n", num);
 }

 /*
@@ -4290,7 +4292,12 @@ restart:
  */
 void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
 {
+       u64 start;
+
        spin_lock(&kvm->mmu_lock);
+
+       start = local_clock();
+
        trace_kvm_mmu_invalidate_zap_all_pages(kvm);
        kvm->arch.mmu_valid_gen++;

@@ -4306,6 +4313,9 @@ void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
        kvm_reload_remote_mmus(kvm);

        kvm_zap_obsolete_pages(kvm);
+
+       printk("%s: %llx.\n", __FUNCTION__, local_clock() - start);
+
        spin_unlock(&kvm->mmu_lock);
 }