From patchwork Thu Sep  3 14:07:38 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Wanpeng Li <wanpeng.li@hotmail.com>
X-Patchwork-Id: 7119621
Return-Path: <kvm-owner@kernel.org>
X-Original-To: patchwork-kvm@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 4ABE89F32B
	for <patchwork-kvm@patchwork.kernel.org>;
	Fri,  4 Sep 2015 00:12:12 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 612EC206FC
	for <patchwork-kvm@patchwork.kernel.org>;
	Fri,  4 Sep 2015 00:12:11 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 60F11206FB
	for <patchwork-kvm@patchwork.kernel.org>;
	Fri,  4 Sep 2015 00:12:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932805AbbIDALf (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Thu, 3 Sep 2015 20:11:35 -0400
Received: from blu004-omc1s2.hotmail.com ([65.55.116.13]:50424 "EHLO
	BLU004-OMC1S2.hotmail.com" rhost-flags-OK-OK-OK-OK) by
	vger.kernel.org
	with ESMTP id S932706AbbIDAL0 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 3 Sep 2015 20:11:26 -0400
Received: from BLU436-SMTP201 ([65.55.116.8]) by BLU004-OMC1S2.hotmail.com
	over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008);
	Thu, 3 Sep 2015 17:11:25 -0700
X-TMN: [/30zbNPhL3yl5I1fEMgAbXRYxkLg1q7nYi6pUQHdgek=]
X-Originating-Email: [wanpeng.li@hotmail.com]
Message-ID: <BLU436-SMTP2016CD0BF5AFCCDAD29C2C780570@phx.gbl>
From: Wanpeng Li <wanpeng.li@hotmail.com>
To: Paolo Bonzini <pbonzini@redhat.com>
CC: David Matlack <dmatlack@google.com>, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, Wanpeng Li <wanpeng.li@hotmail.com>
Subject: [PATCH v7 2/3] KVM: dynamic halt-polling
Date: Thu, 3 Sep 2015 22:07:38 +0800
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1441289259-32072-1-git-send-email-wanpeng.li@hotmail.com>
References: <1441289259-32072-1-git-send-email-wanpeng.li@hotmail.com>
X-OriginalArrivalTime: 04 Sep 2015 00:11:23.0807 (UTC)
	FILETIME=[3BECAAF0:01D0E6A6]
MIME-Version: 1.0
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DATE_IN_PAST_06_12,
	FREEMAIL_FROM,RCVD_IN_DNSWL_HI,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY
	autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

There is a downside of always-poll since poll is still happened for idle 
vCPUs which can waste cpu usage. This patchset add the ability to adjust 
halt_poll_ns dynamically, to grow halt_poll_ns when shot halt is detected,  
and to shrink halt_poll_ns when long halt is detected.

There are two new kernel parameters for changing the halt_poll_ns:
halt_poll_ns_grow and halt_poll_ns_shrink. 

                        no-poll      always-poll    dynamic-poll
-----------------------------------------------------------------------
Idle (nohz) vCPU %c0     0.15%        0.3%            0.2%  
Idle (250HZ) vCPU %c0    1.1%         4.6%~14%        1.2%
TCP_RR latency           34us         27us            26.7us

"Idle (X) vCPU %c0" is the percent of time the physical cpu spent in
c0 over 60 seconds (each vCPU is pinned to a pCPU). (nohz) means the
guest was tickless. (250HZ) means the guest was ticking at 250HZ.

The big win is with ticking operating systems. Running the linux guest
with nohz=off (and HZ=250), we save 3.4%~12.8% CPUs/second and get close 
to no-polling overhead levels by using the dynamic-poll. The savings
should be even higher for higher frequency ticks.

Suggested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 virt/kvm/kvm_main.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 59 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c06e57c..d5e07e9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,9 +66,18 @@
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
 
-static unsigned int halt_poll_ns;
+/* halt polling only reduces halt latency by 5-7 us, 500us is enough */
+static unsigned int halt_poll_ns = 500000;
 module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);
 
+/* Default doubles per-vcpu halt_poll_ns. */
+static unsigned int halt_poll_ns_grow = 2;
+module_param(halt_poll_ns_grow, int, S_IRUGO);
+
+/* Default resets per-vcpu halt_poll_ns . */
+static unsigned int halt_poll_ns_shrink;
+module_param(halt_poll_ns_shrink, int, S_IRUGO);
+
 /*
  * Ordering of locks:
  *
@@ -1907,6 +1916,31 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
 
+static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
+{
+	int val = vcpu->halt_poll_ns;
+
+	/* 10us base */
+	if (val == 0 && halt_poll_ns_grow)
+		val = 10000;
+	else
+		val *= halt_poll_ns_grow;
+
+	vcpu->halt_poll_ns = val;
+}
+
+static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu)
+{
+	int val = vcpu->halt_poll_ns;
+
+	if (halt_poll_ns_shrink == 0)
+		val = 0;
+	else
+		val /= halt_poll_ns_shrink;
+
+	vcpu->halt_poll_ns = val;
+}
+
 static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
 {
 	if (kvm_arch_vcpu_runnable(vcpu)) {
@@ -1928,7 +1962,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 {
 	ktime_t start, cur;
 	DEFINE_WAIT(wait);
-	bool waited = false;
+	bool polled = false, waited = false;
+	u64 poll_ns = 0, wait_ns = 0, block_ns = 0;
 
 	start = cur = ktime_get();
 	if (vcpu->halt_poll_ns) {
@@ -1940,11 +1975,16 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 			 * arrives.
 			 */
 			if (kvm_vcpu_check_block(vcpu) < 0) {
+				polled = true;
 				++vcpu->stat.halt_successful_poll;
-				goto out;
+				break;
 			}
 			cur = ktime_get();
 		} while (single_task_running() && ktime_before(cur, stop));
+
+		poll_ns = ktime_to_ns(cur) - ktime_to_ns(start);
+		if (polled)
+			goto out;
 	}
 
 	for (;;) {
@@ -1959,9 +1999,24 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 
 	finish_wait(&vcpu->wq, &wait);
 	cur = ktime_get();
+	wait_ns = ktime_to_ns(cur) - ktime_to_ns(start);
 
 out:
-	trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited);
+	block_ns = poll_ns + wait_ns;
+
+	if (halt_poll_ns) {
+		if (block_ns <= vcpu->halt_poll_ns)
+			;
+		/* we had a long block, shrink polling */
+		else if (vcpu->halt_poll_ns && block_ns > halt_poll_ns)
+			shrink_halt_poll_ns(vcpu);
+		/* we had a short halt and our poll time is too small */
+		else if (vcpu->halt_poll_ns < halt_poll_ns &&
+			block_ns < halt_poll_ns)
+			grow_halt_poll_ns(vcpu);
+	}
+
+	trace_kvm_vcpu_wakeup(block_ns, waited);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_block);