From patchwork Thu Jun 30 00:52:50 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Wanpeng Li X-Patchwork-Id: 9206391 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 233FD60B17 for ; Thu, 30 Jun 2016 00:53:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1426728462 for ; Thu, 30 Jun 2016 00:53:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 08B73284AB; Thu, 30 Jun 2016 00:53:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7E4B1284B6 for ; Thu, 30 Jun 2016 00:53:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751964AbcF3Axe (ORCPT ); Wed, 29 Jun 2016 20:53:34 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:33324 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751713AbcF3Ax3 (ORCPT ); Wed, 29 Jun 2016 20:53:29 -0400 Received: by mail-pf0-f194.google.com with SMTP id c74so5847759pfb.0; Wed, 29 Jun 2016 17:53:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=i2RfCKXnEKrCA8TvNt/UIrxD75Gc/LPQQ2gotbLycJM=; b=ieEQe80XeyMoafD28RQ8GCHsobW4Poso4G8wsQbnBrnHLBhUybN2ObTWw+QIbCuJNY DPr07eI4MSI2GsmxTJzWXPM3/lrnFOWB+SUltChIEy1hBDEtQZRQJA9dDdcPEfV3tnOn q85Hsjv294KFqxhfdHBQNDxk2vtsu34PCDxmHwuKLk2Cfi3dSqq5UR1XFboylhxlpbGm 1Aboi1CGdnOqcKIIT4MEkxxz0wCFDC7III+k+H9Stsol1YcwmLnqtT9rq2jPT7hW2GSd 1ncVHhI0hhjY1SxkIDfmFU5n+PnVpLlcdqqfcG7OkDNDAeM37PlHgsCFDz9uovPU+6lT lc3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=i2RfCKXnEKrCA8TvNt/UIrxD75Gc/LPQQ2gotbLycJM=; b=eZqhKszratttSXmLDh3T73Gy+8uzD5K+C1hL0eFWOFkJ3SRYFXTFak+ovcByb3CyuE oCqi9vt5QpD5fKlgWWNDzycWI/bDdc5PlG3Zkm+yqTq0fm71SFF/F7V5jxLoVXWclxPw Qr/pIPg2oVhemZp3OnOKotTKC56S6XTxKXKa90K65fhjiwa6qZ9bixx67I/D+yrBuOk4 XCTicWc54JMtY+IUkvTtTMZQJxc1pNfPWQ6eKcl2kBt84u7//GRmN+Ax1acEtyEoLlQl Als+UiC/Mhkr2+sojm5u3ss3SMdhnPtHDQmc29XLAsW8CJvAvbmtmxS1cIFokYmuhYvB 0JCg== X-Gm-Message-State: ALyK8tIiTrmDv/w3eswWfENOAzNenvy54WX6d4lrMvLrf66EEpHsk2FoayYHkh5tr7Wp6w== X-Received: by 10.98.14.72 with SMTP id w69mr16803372pfi.140.1467248009001; Wed, 29 Jun 2016 17:53:29 -0700 (PDT) Received: from kernel.kingsoft.cn ([219.141.176.227]) by smtp.gmail.com with ESMTPSA id d65sm643247pfa.45.2016.06.29.17.53.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 29 Jun 2016 17:53:28 -0700 (PDT) From: Wanpeng Li X-Google-Original-From: Wanpeng Li To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: Wanpeng Li , Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , Yunhong Jiang Subject: [PATCH v3 2/2] KVM: x86: fix underflow in TSC deadline calculation Date: Thu, 30 Jun 2016 08:52:50 +0800 Message-Id: <1467247970-2986-2-git-send-email-wanpeng.li@hotmail.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1467247970-2986-1-git-send-email-wanpeng.li@hotmail.com> References: <1467247970-2986-1-git-send-email-wanpeng.li@hotmail.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Wanpeng Li INFO: rcu_sched detected stalls on CPUs/tasks: 1-...: (11800 GPs behind) idle=45d/140000000000000/0 softirq=0/0 fqs=21663 (detected by 0, t=65016 jiffies, g=11500, c=11499, q=719) Task dump for CPU 1: qemu-system-x86 R running task 0 3529 3525 0x00080808 ffff8802021791a0 ffff880212895040 0000000000000001 00007f1c2c00db40 ffff8801dd20fcd3 ffffc90002b98000 ffff8801dd20fc88 ffff8801dd20fcf8 0000000000000286 ffff8801dd2ac538 ffff8801dd20fcc0 ffffffffc06949c9 Call Trace: ? kvm_write_guest_cached+0xb9/0x160 [kvm] ? __delay+0xf/0x20 ? wait_lapic_expire+0x14a/0x200 [kvm] ? kvm_arch_vcpu_ioctl_run+0xcbe/0x1b00 [kvm] ? kvm_arch_vcpu_ioctl_run+0xe34/0x1b00 [kvm] ? kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm] ? __fget+0x5/0x210 ? do_vfs_ioctl+0x96/0x6a0 ? __fget_light+0x2a/0x90 ? SyS_ioctl+0x79/0x90 ? do_syscall_64+0x7c/0x1e0 ? entry_SYSCALL64_slow_path+0x25/0x25 This can be reproduced readily by running a full dynticks guest(since hrtimer in guest is heavily used) w/ lapic_timer_advance disabled. If fail to program hardware preemption timer, we will fallback to hrtimer based method, however, a previous programmed preemption timer miss to cancel in this scenario which results in one hardware preemption timer and one hrtimer emulated tsc deadline timer run simultaneously. So sometimes the target guest deadline tsc is earlier than guest tsc, which leads to the computation in vmx_set_hv_timer can underflow and cause delta_tsc to be set a huge value, then host soft lockup as above. This patch fix it by cancelling the previous programmed preemption timer if there is once we failed to program the new preemption timer and fallback to hrtimer based method. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Yunhong Jiang Signed-off-by: Wanpeng Li --- v2 -> v3: * depends on start_hv_tscdeadline caller to start sw_timer * move lapic_timer.pending check in start_hv_tscdeadline v1 -> v2: * abstract the set_hv_timer and cancel_hv_tscdeadline arch/x86/kvm/lapic.c | 51 +++++++++++++++++++++++++-------------------------- 1 file changed, 25 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 9c20ac1..b44c587 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1366,27 +1366,35 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer); +static bool start_hv_tscdeadline(struct kvm_lapic *apic) +{ + u64 tscdeadline = apic->lapic_timer.tscdeadline; + + if (atomic_read(&apic->lapic_timer.pending) || + kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) { + if (apic->lapic_timer.hv_timer_in_use) + cancel_hv_tscdeadline(apic); + } else { + apic->lapic_timer.hv_timer_in_use = true; + hrtimer_cancel(&apic->lapic_timer.timer); + + /* In case the sw timer triggered in the window */ + if (atomic_read(&apic->lapic_timer.pending)) + cancel_hv_tscdeadline(apic); + } + trace_kvm_hv_timer_state(apic->vcpu->vcpu_id, + apic->lapic_timer.hv_timer_in_use); + return kvm_lapic_hv_timer_in_use(apic->vcpu); +} + void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; WARN_ON(apic->lapic_timer.hv_timer_in_use); - if (apic_lvtt_tscdeadline(apic) && - !atomic_read(&apic->lapic_timer.pending)) { - u64 tscdeadline = apic->lapic_timer.tscdeadline; - - if (!kvm_x86_ops->set_hv_timer(vcpu, tscdeadline)) { - apic->lapic_timer.hv_timer_in_use = true; - hrtimer_cancel(&apic->lapic_timer.timer); - - /* In case the sw timer triggered in the window */ - if (atomic_read(&apic->lapic_timer.pending)) - cancel_hv_tscdeadline(apic); - } - trace_kvm_hv_timer_state(vcpu->vcpu_id, - apic->lapic_timer.hv_timer_in_use); - } + if (apic_lvtt_tscdeadline(apic)) + start_hv_tscdeadline(apic); } EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer); @@ -1452,18 +1460,9 @@ static void start_apic_timer(struct kvm_lapic *apic) apic->lapic_timer.period, ktime_to_ns(ktime_add_ns(now, apic->lapic_timer.period))); - } else if (apic_lvtt_tscdeadline(apic)) { - /* lapic timer in tsc deadline mode */ - u64 tscdeadline = apic->lapic_timer.tscdeadline; - - if (kvm_x86_ops->set_hv_timer && - !kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) { - apic->lapic_timer.hv_timer_in_use = true; - trace_kvm_hv_timer_state(apic->vcpu->vcpu_id, - apic->lapic_timer.hv_timer_in_use); - } else + } else if (apic_lvtt_tscdeadline(apic)) + if (!(kvm_x86_ops->set_hv_timer && start_hv_tscdeadline(apic))) start_sw_tscdeadline(apic); - } } static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)