From patchwork Tue Sep  1 21:41:16 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Matlack <dmatlack@google.com>
X-Patchwork-Id: 7106951
Return-Path: <kvm-owner@kernel.org>
X-Original-To: patchwork-kvm@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 60EAB9F32B
	for <patchwork-kvm@patchwork.kernel.org>;
	Tue,  1 Sep 2015 21:42:06 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 4512520525
	for <patchwork-kvm@patchwork.kernel.org>;
	Tue,  1 Sep 2015 21:42:05 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 1E19C20524
	for <patchwork-kvm@patchwork.kernel.org>;
	Tue,  1 Sep 2015 21:42:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753851AbbIAVln (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Tue, 1 Sep 2015 17:41:43 -0400
Received: from mail-pa0-f43.google.com ([209.85.220.43]:32975 "EHLO
	mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753392AbbIAVll (ORCPT <rfc822; kvm@vger.kernel.org>);
	Tue, 1 Sep 2015 17:41:41 -0400
Received: by paap5 with SMTP id p5so764144paa.0
	for <kvm@vger.kernel.org>; Tue, 01 Sep 2015 14:41:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=14bbqtN1i+ImpHmK15KUt0NePEZzcwD6xEeu0lKtkNU=;
	b=PpLhYjKxhNrRd/+rHfWQBZeoGkGfxW5kQm5fwSeCr6J7jmqxOWySJJmqfzFr+Lr79J
	XN107MDh8MvWm6amVX7TY0ghJfD+mX6e0NIpwm0TmhQJXVO3c2LfrsALXv9gWfhAD2jE
	m6WEGvrR3gq7g+YQAa7xEOgnGRYds+6WepffLHz8Rv1lAsKRsSGyQXY74IHv0vGMul3l
	zpHGKXCGgWtUN64BH3I21VQhfWqPthmLfEhTCzrWgLXCfEiMFsqjjUf58N1S44h2Gxut
	ZHwNPq9DAZ+N9/ozf0hx64hjAaTuoe66uCnM4DUDT+5ryxV0oLWw29Vdtkyjk1vQLGMW
	Z40Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=14bbqtN1i+ImpHmK15KUt0NePEZzcwD6xEeu0lKtkNU=;
	b=g2ifsO51BpB90k4Tpd1Qr6oFiuI0Gy9Icm/gTCJF1N3El5LMTGHH0my+Supra36kez
	jcOeQ/CYDnj0NROl12sKfA3+/hQZZzA0TyId4mGvSTwcZqjM00SHN1V1Ut2eV9zYeY5k
	wY5twN1WZMq3tYEdHsMSijrg+9Jmrs2vzUN1QzVYLimhrS1hokCFjI5xU2o/gdoAS1Ye
	vCSAi6jEaIWS920YYf1gK1M7iTBbgBGmSuGtagO/GgGv88dLAOoEGpyf5P+tFPsUbiF3
	47v27GsZ1Al3SCEDSt/kjCF6cQ2dhDqtNkEihMm1Z8YHaTnTlq/eW0uy7XOApyQJwGNi
	DkJg==
X-Gm-Message-State: 
 ALoCoQllOJlTSCv85r0rZmz3OlrGUYLpH9LQnkuPG404mYF/0vLA6cIDMtEBl3ED09r2zRuX46EQ
X-Received: by 10.66.62.163 with SMTP id z3mr50369579par.12.1441143700585;
	Tue, 01 Sep 2015 14:41:40 -0700 (PDT)
Received: from dmatlack-linux.kir.corp.google.com ([172.31.88.63])
	by smtp.gmail.com with ESMTPSA id
	yu2sm19381125pac.33.2015.09.01.14.41.39
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Tue, 01 Sep 2015 14:41:40 -0700 (PDT)
From: David Matlack <dmatlack@google.com>
To: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, pbonzini@redhat.com,
	wanpeng.li@hotmail.com, peter@kieser.ca,
	David Matlack <dmatlack@google.com>
Subject: [PATCH 2/2] kvm: adaptive halt-polling toggle
Date: Tue,  1 Sep 2015 14:41:16 -0700
Message-Id: <1441143676-9375-3-git-send-email-dmatlack@google.com>
X-Mailer: git-send-email 2.5.0.457.gab17608
In-Reply-To: <1441143676-9375-1-git-send-email-dmatlack@google.com>
References: <1441143676-9375-1-git-send-email-dmatlack@google.com>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,
	DKIM_ADSP_CUSTOM_MED,
	DKIM_SIGNED,RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

This patch removes almost all of the overhead of polling for idle VCPUs
by disabling polling for long halts. The length of the previous halt
is used as a predictor for the current halt:

  if (length of previous halt < halt_poll_ns): poll for halt_poll_ns
  else: don't poll

This tends to work well in practice. For VMs running Message Passing
workloads, all halts are short and so the VCPU should always poll. When
a VCPU is idle, all halts are long and so the VCPU should never halt.
Experimental results on an IvyBridge host show adaptive toggling gets
close to the best of both worlds.

                           no-poll     always-poll    adaptive-toggle
---------------------------------------------------------------------
Idle (nohz) VCPU %c0       0.12        0.32           0.15
Idle (250HZ) VCPU %c0      1.22        6.35           1.27
TCP_RR latency             39 us       25 us          25 us

(3.16 Linux guest, halt_poll_ns=200000)

The big win is with ticking operating systems. Running the linux guest
with nohz=off (and HZ=250), we save 5% CPU/second and get close to
no-polling overhead levels by using the adaptive toggle. The savings
should be even higher for higher frequency ticks.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 include/trace/events/kvm.h |  23 ++++++----
 virt/kvm/kvm_main.c        | 110 ++++++++++++++++++++++++++++++++++-----------
 2 files changed, 97 insertions(+), 36 deletions(-)

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index a44062d..34e0b11 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -38,22 +38,27 @@ TRACE_EVENT(kvm_userspace_exit,
 );
 
 TRACE_EVENT(kvm_vcpu_wakeup,
-	    TP_PROTO(__u64 ns, bool waited),
-	    TP_ARGS(ns, waited),
+	TP_PROTO(bool poll, bool success, __u64 poll_ns, __u64 wait_ns),
+	TP_ARGS(poll, success, poll_ns, wait_ns),
 
 	TP_STRUCT__entry(
-		__field(	__u64,		ns		)
-		__field(	bool,		waited		)
+		__field(	 bool,		poll		)
+		__field(	 bool,		success		)
+		__field(	__u64,		poll_ns		)
+		__field(	__u64,		wait_ns		)
 	),
 
 	TP_fast_assign(
-		__entry->ns		= ns;
-		__entry->waited		= waited;
+		__entry->poll		= poll;
+		__entry->success	= success;
+		__entry->poll_ns	= poll_ns;
+		__entry->wait_ns	= wait_ns;
 	),
 
-	TP_printk("%s time %lld ns",
-		  __entry->waited ? "wait" : "poll",
-		  __entry->ns)
+	TP_printk("%s %s, poll ns %lld, wait ns %lld",
+		  __entry->poll ? "poll" : "wait",
+		  __entry->success ? "success" : "fail",
+		  __entry->poll_ns, __entry->wait_ns)
 );
 
 #if defined(CONFIG_HAVE_KVM_IRQFD)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 977ffb1..3a66694 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -66,7 +66,8 @@
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");
 
-static unsigned int halt_poll_ns;
+/* The maximum amount of time a vcpu will poll for interrupts while halted. */
+static unsigned int halt_poll_ns = 200000;
 module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);
 
 /*
@@ -1907,6 +1908,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty);
 
+/* This sets KVM_REQ_UNHALT if an interrupt arrives. */
 static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
 {
 	if (kvm_arch_vcpu_runnable(vcpu)) {
@@ -1921,47 +1923,101 @@ static int kvm_vcpu_check_block(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-/*
- * The vCPU has executed a HLT instruction with in-kernel mode enabled.
- */
-void kvm_vcpu_block(struct kvm_vcpu *vcpu)
+static void
+update_vcpu_block_predictor(struct kvm_vcpu *vcpu, u64 poll_ns, u64 wait_ns)
 {
-	ktime_t start, cur;
-	DEFINE_WAIT(wait);
-	bool waited = false;
-
-	start = cur = ktime_get();
-	if (vcpu->halt_poll_ns) {
-		ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
-
-		do {
-			/*
-			 * This sets KVM_REQ_UNHALT if an interrupt
-			 * arrives.
-			 */
-			if (kvm_vcpu_check_block(vcpu) < 0) {
-				++vcpu->stat.halt_successful_poll;
-				goto out;
-			}
-			cur = ktime_get();
-		} while (single_task_running() && ktime_before(cur, stop));
+	u64 block_ns = poll_ns + wait_ns;
+
+	if (block_ns <= vcpu->halt_poll_ns)
+		return;
+
+	if (block_ns < halt_poll_ns)
+		/* we had a short block and our poll time is too small */
+		vcpu->halt_poll_ns = halt_poll_ns;
+	else
+		/* we had a long block. disable polling. */
+		vcpu->halt_poll_ns = 0;
+}
+
+static bool kvm_vcpu_try_poll(struct kvm_vcpu *vcpu, u64 *poll_ns)
+{
+	bool done = false;
+	ktime_t deadline;
+	ktime_t start;
+
+	start = ktime_get();
+	deadline = ktime_add_ns(start, vcpu->halt_poll_ns);
+
+	while (single_task_running() && ktime_before(ktime_get(), deadline)) {
+		if (kvm_vcpu_check_block(vcpu) < 0) {
+			++vcpu->stat.halt_successful_poll;
+			done = true;
+			break;
+		}
 	}
 
+	*poll_ns = ktime_to_ns(ktime_sub(ktime_get(), start));
+	return done;
+}
+
+static void kvm_vcpu_wait(struct kvm_vcpu *vcpu, u64 *wait_ns)
+{
+	DEFINE_WAIT(wait);
+	ktime_t start;
+
+	start = ktime_get();
+
 	for (;;) {
 		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
 		if (kvm_vcpu_check_block(vcpu) < 0)
 			break;
 
-		waited = true;
 		schedule();
 	}
 
 	finish_wait(&vcpu->wq, &wait);
-	cur = ktime_get();
+
+	*wait_ns = ktime_to_ns(ktime_sub(ktime_get(), start));
+}
+
+void __kvm_vcpu_block(struct kvm_vcpu *vcpu)
+{
+	bool prediction_success = false;
+	u64 poll_ns = 0;
+	u64 wait_ns = 0;
+
+	if (vcpu->halt_poll_ns && kvm_vcpu_try_poll(vcpu, &poll_ns)) {
+		prediction_success = true;
+		goto out;
+	}
+
+	kvm_vcpu_wait(vcpu, &wait_ns);
+
+	if (!vcpu->halt_poll_ns && wait_ns > halt_poll_ns)
+		prediction_success = true;
 
 out:
-	trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited);
+	trace_kvm_vcpu_wakeup(vcpu->halt_poll_ns, prediction_success,
+			      poll_ns, wait_ns);
+
+	update_vcpu_block_predictor(vcpu, poll_ns, wait_ns);
+}
+
+/*
+ * The vCPU has executed a HLT instruction with in-kernel mode enabled.
+ */
+void kvm_vcpu_block(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * kvm_vcpu_block can be called more than once between vcpu resumes.
+	 * All calls except the first will always return immediately. We don't
+	 * want those calls to affect poll/wait prediction, so we return here.
+	 */
+	if (kvm_vcpu_check_block(vcpu) < 0)
+		return;
+
+	__kvm_vcpu_block(vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_block);