From patchwork Mon Apr 4 20:46:07 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luiz Capitulino X-Patchwork-Id: 8744341 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 14BE59F336 for ; Mon, 4 Apr 2016 20:46:34 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 29C162011E for ; Mon, 4 Apr 2016 20:46:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 327FF20219 for ; Mon, 4 Apr 2016 20:46:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756472AbcDDUqO (ORCPT ); Mon, 4 Apr 2016 16:46:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60791 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751632AbcDDUqN (ORCPT ); Mon, 4 Apr 2016 16:46:13 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 13B90C00B8D7; Mon, 4 Apr 2016 20:46:12 +0000 (UTC) Received: from localhost (ovpn-116-102.ams2.redhat.com [10.36.116.102]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u34Kk8iN032624; Mon, 4 Apr 2016 16:46:09 -0400 Date: Mon, 4 Apr 2016 16:46:07 -0400 From: Luiz Capitulino To: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org, pbonzini@redhat.com, rkrcmar@redhat.com, mtosatti@redhat.com, riel@redhat.com, bsd@redhat.com Subject: [PATCH] kvm: x86: make lapic hrtimer pinned Message-ID: <20160404164607.09e306fa@redhat.com> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When a vCPU runs on a nohz_full core, the hrtimer used by the lapic emulation code can be migrated to another core. When this happens, it's possible to observe milisecond latency when delivering timer IRQs to KVM guests. The huge latency is mainly due to the fact that apic_timer_fn() expects to run during a kvm exit. It sets KVM_REQ_PENDING_TIMER and let it be handled on kvm entry. However, if the timer fires on a different core, we have to wait until the next kvm exit for the guest to see KVM_REQ_PENDING_TIMER set. This problem became visible after commit 9642d18ee. This commit changed the timer migration code to always attempt to migrate timers away from nohz_full cores. While it's discussable if this is correct/desirable (I don't think it is), it's clear that the lapic emulation code has a requirement on firing the hrtimer in the same core where it was started. This is achieved by making the hrtimer pinned. Lastly, note that KVM has code to migrate timers when a vCPU is scheduled to run in different core. However, this forced migration may fail. When this happens, we can have the same problem. If we want 100% correctness, we'll have to modify apic_timer_fn() to cause a kvm exit when it runs on a different core than the vCPU. Not sure if this is possible. Here's a reproducer for the issue being fixed: 1. Set all cores but core0 to be nohz_full cores 2. Start a guest with a single vCPU 3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs() You'll see that apic_timer_fn() will run in core0 while kvm_inject_apic_timer_irqs() runs in a different core. If you get both on core0, try running a program that takes 100% of the CPU and pin it to core0 to force the vCPU out. Signed-off-by: Luiz Capitulino Acked-by: Rik van Riel --- arch/x86/kvm/lapic.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 443d2a5..1a2da0e 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1369,7 +1369,7 @@ static void start_apic_timer(struct kvm_lapic *apic) hrtimer_start(&apic->lapic_timer.timer, ktime_add_ns(now, apic->lapic_timer.period), - HRTIMER_MODE_ABS); + HRTIMER_MODE_ABS_PINNED); apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016" PRIx64 ", " @@ -1402,7 +1402,7 @@ static void start_apic_timer(struct kvm_lapic *apic) expire = ktime_add_ns(now, ns); expire = ktime_sub_ns(expire, lapic_timer_advance_ns); hrtimer_start(&apic->lapic_timer.timer, - expire, HRTIMER_MODE_ABS); + expire, HRTIMER_MODE_ABS_PINNED); } else apic_timer_expired(apic); @@ -1868,7 +1868,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) apic->vcpu = vcpu; hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC, - HRTIMER_MODE_ABS); + HRTIMER_MODE_ABS_PINNED); apic->lapic_timer.timer.function = apic_timer_fn; /* @@ -2003,7 +2003,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu) timer = &vcpu->arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS); + hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED); } /*