From patchwork Thu Jun 22 11:22:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Zhang X-Patchwork-Id: 9804059 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6F0946088A for ; Thu, 22 Jun 2017 11:23:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6016F28600 for ; Thu, 22 Jun 2017 11:23:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5484128397; Thu, 22 Jun 2017 11:23:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B61B0285E8 for ; Thu, 22 Jun 2017 11:23:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753012AbdFVLWo (ORCPT ); Thu, 22 Jun 2017 07:22:44 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:34991 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752719AbdFVLWm (ORCPT ); Thu, 22 Jun 2017 07:22:42 -0400 Received: by mail-pf0-f195.google.com with SMTP id s66so2447438pfs.2; Thu, 22 Jun 2017 04:22:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=BO2Ld6l+Xd8ikwWCGbD6MAKXvU7HVUw57dcxZnM00G0=; b=hSHzOiETmNa4u09RGqH1ZOoI+fbWVRF9rzJUDUJYrvoGm31C/2C2jrsp3jsRjyIAyp 1ZAjTBqJUXdU9QHDgTfF+Jkk/TOJuOPd205osPBATSZfyL+9s2fZ3TSt8gQnjcjkzye0 3HkCkyh9jLJtvoO3hbtrAGgGjgQ+PvutUmA50L1BuydU6sfffi+lX3ienZTEtyTINzDU Pt1zpswX7+DN3VJ6tMWZ9WnaEeXBGcL8x4QM6TbaQmP3BqHXgJMfh17bsJakL7+WoxtJ 7z+2bMRX8nOJvL/OGAmuKl2CwD3kEjtG7t9fJzje65LTzysZRX0HvNjg4+6LAbfmmDUl 6QZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=BO2Ld6l+Xd8ikwWCGbD6MAKXvU7HVUw57dcxZnM00G0=; b=fyeiH9ElAPWZGn+IyOKhQcPfOkJMEi+UO+CB/i0piWyRn1DYUjA17SN5BsTl4BWvDF azOhFcutfWwK580eJKNUO5rsgiUKB5SxQWLJn2px89mB3/HCUKcSFpNlLQa4YVCQLDwF dc8gklRLvBZ+KcUPpdSAyj2iHmjjb5ZuCZsSh20bhiKP5GxejddCTkgeUFsqlzripgKx AFdsc0Vaselmcx/3KXBNswkLjxyN9bpYGOanEJRXzKrhm8D7+e1BIWk+NYWJhvDfisjc 5f//r8z/Kj1/2xJar3jMgDKPHkI90kspQg8uuEAkQuMyQQfAuNky2eLpksXNJnfmlPRL pTWg== X-Gm-Message-State: AKS2vOxZUhUx1goiBuMvfQTcuGC8/fBDayaWt8R2OuGa7j1zOhi5lQ6R eqAhaRDXrPi55w== X-Received: by 10.84.209.228 with SMTP id y91mr2363148plh.210.1498130561273; Thu, 22 Jun 2017 04:22:41 -0700 (PDT) Received: from ip-172-31-39-62.us-west-2.compute.internal (ec2-35-164-200-87.us-west-2.compute.amazonaws.com. [35.164.200.87]) by smtp.googlemail.com with ESMTPSA id u9sm3556632pfg.127.2017.06.22.04.22.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 22 Jun 2017 04:22:40 -0700 (PDT) From: root X-Google-Original-From: root To: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, pbonzini@redhat.com Cc: x86@kernel.org, corbet@lwn.net, tony.luck@intel.com, bp@alien8.de, peterz@infradead.org, mchehab@kernel.org, akpm@linux-foundation.org, krzk@kernel.org, jpoimboe@redhat.com, luto@kernel.org, borntraeger@de.ibm.com, thgarnie@google.com, rgerst@gmail.com, minipli@googlemail.com, douly.fnst@cn.fujitsu.com, nicstange@gmail.com, fweisbec@gmail.com, dvlasenk@redhat.com, bristot@redhat.com, yamada.masahiro@socionext.com, mika.westerberg@linux.intel.com, yu.c.chen@intel.com, aaron.lu@intel.com, rostedt@goodmis.org, me@kylehuey.com, len.brown@intel.com, prarit@redhat.com, hidehiro.kawai.ez@hitachi.com, fengtiantian@huawei.com, pmladek@suse.com, jeyu@redhat.com, Larry.Finger@lwfinger.net, zijun_hu@htc.com, luisbg@osg.samsung.com, johannes.berg@intel.com, niklas.soderlund+renesas@ragnatech.se, zlpnobody@gmail.com, adobriyan@gmail.com, fgao@ikuai8.com, ebiederm@xmission.com, subashab@codeaurora.org, arnd@arndb.de, matt@codeblueprint.co.uk, mgorman@techsingularity.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-edac@vger.kernel.org, kvm@vger.kernel.org, Yang Zhang Subject: [PATCH 1/2] x86/idle: add halt poll for halt idle Date: Thu, 22 Jun 2017 11:22:13 +0000 Message-Id: <1498130534-26568-2-git-send-email-root@ip-172-31-39-62.us-west-2.compute.internal> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1498130534-26568-1-git-send-email-root@ip-172-31-39-62.us-west-2.compute.internal> References: <1498130534-26568-1-git-send-email-root@ip-172-31-39-62.us-west-2.compute.internal> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Yang Zhang This patch introduce a new mechanism to poll for a while before entering idle state. David has a topic in KVM forum to describe the problem on current KVM VM when running some message passing workload in KVM forum. Also, there are some work to improve the performance in KVM, like halt polling in KVM. But we still has 4 MSR wirtes and HLT vmexit when going into halt idle which introduce lot of latency. Halt polling in KVM provide the capbility to not schedule out VCPU when it is the only task in this pCPU. Unlike it, this patch will let VCPU polls for a while if there is no work inside VCPU to elimiate heavy vmexit during in/out idle. The potential impact is it will cost more CPU cycle since we are doing polling and may impact other task which waiting on the same physical CPU in host. Here is the data i get when running benchmark contextswitch (https://github.com/tsuna/contextswitch) before patch: 2000000 process context switches in 4822613801ns (2411.3ns/ctxsw) after patch: 2000000 process context switches in 3584098241ns (1792.0ns/ctxsw) Signed-off-by: Yang Zhang --- Documentation/sysctl/kernel.txt | 10 ++++++++++ arch/x86/kernel/process.c | 21 +++++++++++++++++++++ include/linux/kernel.h | 3 +++ kernel/sched/idle.c | 3 +++ kernel/sysctl.c | 9 +++++++++ 5 files changed, 46 insertions(+) diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index bac23c1..4e71bfe 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -63,6 +63,7 @@ show up in /proc/sys/kernel: - perf_event_max_stack - perf_event_max_contexts_per_stack - pid_max +- poll_threshold_ns [ X86 only ] - powersave-nap [ PPC only ] - printk - printk_delay @@ -702,6 +703,15 @@ kernel tries to allocate a number starting from this one. ============================================================== +poll_threshold_ns: (X86 only) + +This parameter used to control the max wait time to poll before going +into real idle state. By default, the values is 0 means don't poll. +It is recommended to change the value to non-zero if running latency-bound +workloads in VM. + +============================================================== + powersave-nap: (PPC only) If set, Linux-PPC will use the 'nap' mode of powersaving, diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 0bb8842..6361783 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -39,6 +39,10 @@ #include #include +#ifdef CONFIG_HYPERVISOR_GUEST +unsigned long poll_threshold_ns; +#endif + /* * per-CPU TSS segments. Threads are completely 'soft' on Linux, * no more per-task TSS's. The TSS size is kept cacheline-aligned @@ -313,6 +317,23 @@ static inline void play_dead(void) } #endif +#ifdef CONFIG_HYPERVISOR_GUEST +void arch_cpu_idle_poll(void) +{ + ktime_t start, cur, stop; + + if (poll_threshold_ns) { + start = cur = ktime_get(); + stop = ktime_add_ns(ktime_get(), poll_threshold_ns); + do { + if (need_resched()) + break; + cur = ktime_get(); + } while (ktime_before(cur, stop)); + } +} +#endif + void arch_cpu_idle_enter(void) { tsc_verify_tsc_adjust(false); diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 13bc08a..04cf774 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -460,6 +460,9 @@ extern __scanf(2, 0) extern int sysctl_panic_on_stackoverflow; extern bool crash_kexec_post_notifiers; +#ifdef CONFIG_HYPERVISOR_GUEST +extern unsigned long poll_threshold_ns; +#endif /* * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 2a25a9e..e789f99 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -74,6 +74,7 @@ static noinline int __cpuidle cpu_idle_poll(void) } /* Weak implementations for optional arch specific functions */ +void __weak arch_cpu_idle_poll(void) { } void __weak arch_cpu_idle_prepare(void) { } void __weak arch_cpu_idle_enter(void) { } void __weak arch_cpu_idle_exit(void) { } @@ -219,6 +220,8 @@ static void do_idle(void) */ __current_set_polling(); + arch_cpu_idle_poll(); + tick_nohz_idle_enter(); while (!need_resched()) { diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 4dfba1a..9174d57 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1203,6 +1203,15 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write, .extra2 = &one, }, #endif +#ifdef CONFIG_HYPERVISOR_GUEST + { + .procname = "halt_poll_threshold", + .data = &poll_threshold_ns, + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = proc_dointvec, + }, +#endif { } };