From patchwork Wed Jun 25 20:26:50 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Borislav Petkov X-Patchwork-Id: 4424451 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 2A576BEEAA for ; Wed, 25 Jun 2014 20:27:23 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id E279120254 for ; Wed, 25 Jun 2014 20:27:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 80CDE201DC for ; Wed, 25 Jun 2014 20:27:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755405AbaFYU07 (ORCPT ); Wed, 25 Jun 2014 16:26:59 -0400 Received: from mail.skyhub.de ([78.46.96.112]:57144 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754214AbaFYU06 (ORCPT ); Wed, 25 Jun 2014 16:26:58 -0400 X-Virus-Scanned: Nedap ESD1 at mail.skyhub.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=alien8.de; s=alien8; t=1403728015; bh=rhG0GZBgLKTrsTQL7bRSZ6y7BQ/A9FGSVipJdZXyXSE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=n3XXxv3OHvsSgIWNv/6NfXVbPeiSFClrJrSLjs ST1b4nt0Mlz80szuqhyxg8LlPnPe7f+jA/Av2YloPv/iM862vaFLs0BE4ETj88W6HG4 w6XcPikB1wS2ZJ5SWDhSWABp3TEP9BIIdopROPUc3rxD2LWnKGHWxYTfgAezcb+8rk= Received: from mail.skyhub.de ([127.0.0.1]) by localhost (door.skyhub.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id lMU1bnPpZN+Z; Wed, 25 Jun 2014 22:26:54 +0200 (CEST) Received: from liondog.tnic (p5DDC6679.dip0.t-ipconnect.de [93.220.102.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 07493244977; Wed, 25 Jun 2014 22:26:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=alien8.de; s=alien8; t=1403728014; bh=rhG0GZBgLKTrsTQL7bRSZ6y7BQ/A9FGSVipJdZXyXSE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=kIeDmsfQCc+eGFZ8HzANDAndd+Y6oNFh7LhX0w WzcN0w9v/fyb33FHUAgM4Lr6uVGVrdPqAcP0iZtRL+7APJDeIGbBjAGxbRRgmPDZFqO ejEfAnCDI+xZwEEdMQhQ6iwg/iaWh1jQ1HY7k2n/8a/mCIGymz92ZIeOY39/C98ocs= Received: by liondog.tnic (Postfix, from userid 1000) id CD75C1004C0; Wed, 25 Jun 2014 22:26:50 +0200 (CEST) Date: Wed, 25 Jun 2014 22:26:50 +0200 From: Borislav Petkov To: lkml , Paolo Bonzini Cc: Peter Zijlstra , Steven Rostedt , x86-ml , kvm@vger.kernel.org Subject: Re: __schedule #DF splat Message-ID: <20140625202650.GC13845@pd.tnic> References: <20140625153227.GA13845@pd.tnic> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20140625153227.GA13845@pd.tnic> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Jun 25, 2014 at 05:32:28PM +0200, Borislav Petkov wrote: > Hi guys, > > so I'm looking at this splat below when booting current linus+tip/master > in a kvm guest. Initially I thought this is something related to the > PARAVIRT gunk but it happens with and without it. Ok, here's a cleaner splat. I went and rebuilt qemu to latest master from today to rule out some breakage there but it still fires. Paolo, any ideas why would kvm+qemu trigger a #DF in the guest? I guess I should dust off my old kvm/qemu #DF debugging patch I had somewhere... I did try to avoid the invalid stack issue by doing: --- --- but that didn't work either - see second splat at the end. [ 2.704184] PANIC: double fault, error_code: 0x0 [ 2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7 [ 2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 2.708132] task: ffff880079c78000 ti: ffff880079c74000 task.ti: ffff880079c74000 [ 2.708132] RIP: 0010:[] [] __schedule+0x28f/0xab0 [ 2.708132] RSP: 002b:00007fff99e51100 EFLAGS: 00013082 [ 2.708132] RAX: 000000007b206000 RBX: ffff88007b526f80 RCX: 0000000000000028 [ 2.708132] RDX: ffffffff816112fe RSI: 0000000000000001 RDI: ffff88007c5d3c58 [ 2.708132] RBP: 00007fff99e511f0 R08: 0000000000000000 R09: 0000000000000000 [ 2.708132] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c5d3c40 [ 2.708132] R13: ffff880079c84e40 R14: 0000000000000000 R15: ffff880079c78000 [ 2.708132] FS: 00007ff252c6d700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000 [ 2.708132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.708132] CR2: 00007fff99e510f8 CR3: 000000007b206000 CR4: 00000000000006e0 [ 2.708132] Stack: [ 2.708132] BUG: unable to handle kernel paging request at 00007fff99e51100 [ 2.708132] IP: [] show_stack_log_lvl+0x11c/0x1d0 [ 2.708132] PGD 7b20d067 PUD 0 [ 2.708132] Oops: 0000 [#1] PREEMPT SMP [ 2.708132] Modules linked in: [ 2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7 [ 2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 2.708132] task: ffff880079c78000 ti: ffff880079c74000 task.ti: ffff880079c74000 [ 2.708132] RIP: 0010:[] [] show_stack_log_lvl+0x11c/0x1d0 [ 2.708132] RSP: 002b:ffff88007c405e58 EFLAGS: 00013046 [ 2.708132] RAX: 00007fff99e51108 RBX: 0000000000000000 RCX: ffff88007c403fc0 [ 2.708132] RDX: 00007fff99e51100 RSI: ffff88007c400000 RDI: ffffffff81846aba [ 2.708132] RBP: ffff88007c405ea8 R08: ffff88007c3fffc0 R09: 0000000000000000 [ 2.708132] R10: 000000007c400000 R11: 0000000000000000 R12: ffff88007c405f58 [ 2.708132] R13: 0000000000000000 R14: ffffffff818136fc R15: 0000000000000000 [ 2.708132] FS: 00007ff252c6d700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000 [ 2.708132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.708132] CR2: 00007fff99e51100 CR3: 000000007b206000 CR4: 00000000000006e0 [ 2.708132] Stack: [ 2.708132] 0000000000000008 ffff88007c405eb8 ffff88007c405e70 000000007b206000 [ 2.708132] 00007fff99e51100 ffff88007c405f58 00007fff99e51100 0000000000000040 [ 2.708132] 0000000000000ac0 ffff880079c78000 ffff88007c405f08 ffffffff81005d10 [ 2.708132] Call Trace: [ 2.708132] <#DF> [ 2.708132] [] show_regs+0xa0/0x280 [ 2.708132] [] df_debug+0x23/0x40 [ 2.708132] [] do_double_fault+0x5d/0x80 [ 2.708132] [] double_fault+0x27/0x30 [ 2.708132] [] ? __schedule+0x27e/0xab0 [ 2.708132] [] ? __schedule+0x28f/0xab0 [ 2.708132] <> [ 2.708132] Code: 7a ff ff ff 0f 1f 00 e8 93 80 00 00 eb a5 48 39 ca 0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 <48> 8b 32 48 c7 c7 f4 36 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7 [ 2.708132] RIP [] show_stack_log_lvl+0x11c/0x1d0 [ 2.708132] RSP [ 2.708132] CR2: 00007fff99e51100 [ 2.708132] ---[ end trace 749cd02c31c493a0 ]--- [ 2.708132] note: vmmouse_detect[959] exited with preempt_count 3 [ 1.730726] VFS: Mounted root (ext3 filesystem) readonly on device 8:1. [ 1.737392] devtmpfs: mounted [ 1.748817] Freeing unused kernel memory: 2872K (ffffffff819a9000 - ffffffff81c77000) [ 2.249240] udevd[812]: starting version 175 [ 2.563876] PANIC: double fault, error_code: 0x0 [ 2.563885] [ 2.564051] ====================================================== [ 2.564051] [ INFO: possible circular locking dependency detected ] [ 2.575059] 3.15.0+ #8 Not tainted [ 2.575059] ------------------------------------------------------- [ 2.575059] vmmouse_detect/960 is trying to acquire lock: [ 2.575059] ((console_sem).lock){-.....}, at: [] down_trylock+0x1d/0x50 [ 2.575059] [ 2.575059] but task is already holding lock: [ 2.575059] (&rq->lock){-.-.-.}, at: [] __schedule+0xdf/0xab0 [ 2.575059] [ 2.575059] which lock already depends on the new lock. [ 2.575059] [ 2.575059] [ 2.575059] the existing dependency chain (in reverse order) is: [ 2.575059] -> #2 (&rq->lock){-.-.-.}: [ 2.575059] [] lock_acquire+0xb9/0x200 [ 2.575059] [] _raw_spin_lock+0x41/0x80 [ 2.575059] [] wake_up_new_task+0xbb/0x290 [ 2.575059] [] do_fork+0x147/0x770 [ 2.575059] [] kernel_thread+0x26/0x30 [ 2.575059] [] rest_init+0x22/0x140 [ 2.575059] [] start_kernel+0x408/0x415 [ 2.575059] [] x86_64_start_reservations+0x2a/0x2c [ 2.575059] [] x86_64_start_kernel+0xf6/0xf9 [ 2.575059] -> #1 (&p->pi_lock){-.-.-.}: [ 2.575059] [] lock_acquire+0xb9/0x200 [ 2.575059] [] _raw_spin_lock_irqsave+0x53/0x90 [ 2.575059] [] try_to_wake_up+0x3c/0x330 [ 2.575059] [] wake_up_process+0x23/0x40 [ 2.575059] [] __up.isra.0+0x1f/0x30 [ 2.575059] [] up+0x41/0x50 [ 2.575059] [] console_unlock+0x258/0x490 [ 2.575059] [] vprintk_emit+0x291/0x610 [ 2.575059] [] printk_emit+0x33/0x3b [ 2.575059] [] devkmsg_writev+0x154/0x1d0 [ 2.575059] [] do_sync_write+0x5a/0x90 [ 2.575059] [] vfs_write+0x175/0x1c0 [ 2.575059] [] SyS_write+0x52/0xc0 [ 2.575059] [] system_call_fastpath+0x1a/0x1f [ 2.575059] -> #0 ((console_sem).lock){-.....}: [ 2.575059] [] __lock_acquire+0x1f14/0x2290 [ 2.575059] [] lock_acquire+0xb9/0x200 [ 2.575059] [] _raw_spin_lock_irqsave+0x53/0x90 [ 2.575059] [] down_trylock+0x1d/0x50 [ 2.575059] [] console_trylock+0x1e/0xb0 [ 2.575059] [] vprintk_emit+0x273/0x610 [ 2.575059] [] printk+0x4f/0x57 [ 2.575059] [] df_debug+0x1b/0x40 [ 2.575059] [] do_double_fault+0x5d/0x80 [ 2.575059] [] double_fault+0x27/0x30 [ 2.575059] [ 2.575059] other info that might help us debug this: [ 2.575059] [ 2.575059] Chain exists of: (console_sem).lock --> &p->pi_lock --> &rq->lock [ 2.575059] Possible unsafe locking scenario: [ 2.575059] [ 2.575059] CPU0 CPU1 [ 2.575059] ---- ---- [ 2.575059] lock(&rq->lock); [ 2.575059] lock(&p->pi_lock); [ 2.575059] lock(&rq->lock); [ 2.575059] lock((console_sem).lock); [ 2.575059] [ 2.575059] *** DEADLOCK *** [ 2.575059] [ 2.575059] 1 lock held by vmmouse_detect/960: [ 2.575059] #0: (&rq->lock){-.-.-.}, at: [] __schedule+0xdf/0xab0 [ 2.575059] [ 2.575059] stack backtrace: [ 2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8 [ 2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 2.575059] ffffffff823ef810 ffff88007c205c50 ffffffff8160f461 ffffffff823f22b0 [ 2.575059] ffff88007c205c90 ffffffff8109fd2b ffffffff82802180 ffff880079ca2e48 [ 2.575059] ffff880079ca2718 ffff880079ca26e0 0000000000000001 ffff880079ca2e10 [ 2.575059] Call Trace: [ 2.575059] <#DF> [] dump_stack+0x4e/0x7a [ 2.575059] [] print_circular_bug+0x1fb/0x330 [ 2.575059] [] __lock_acquire+0x1f14/0x2290 [ 2.575059] [] lock_acquire+0xb9/0x200 [ 2.575059] [] ? down_trylock+0x1d/0x50 [ 2.575059] [] _raw_spin_lock_irqsave+0x53/0x90 [ 2.575059] [] ? down_trylock+0x1d/0x50 [ 2.575059] [] ? vprintk_emit+0x273/0x610 [ 2.575059] [] down_trylock+0x1d/0x50 [ 2.575059] [] ? vprintk_emit+0x273/0x610 [ 2.575059] [] console_trylock+0x1e/0xb0 [ 2.575059] [] vprintk_emit+0x273/0x610 [ 2.575059] [] printk+0x4f/0x57 [ 2.575059] [] df_debug+0x1b/0x40 [ 2.575059] [] do_double_fault+0x5d/0x80 [ 2.575059] [] double_fault+0x27/0x30 [ 2.575059] [] ? __schedule+0x27e/0xab0 [ 2.575059] [] ? __schedule+0x28f/0xab0 [ 2.575059] <> [ 2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8 [ 2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 2.575059] task: ffff880079ca26e0 ti: ffff880079d04000 task.ti: ffff880079d04000 [ 2.575059] RIP: 0010:[] [] __schedule+0x28f/0xab0 [ 2.575059] RSP: 002b:00007fff70516420 EFLAGS: 00013086 [ 2.575059] RAX: 000000007ae81000 RBX: ffff88007be67900 RCX: 0000000000000028 [ 2.575059] RDX: ffffffff8161132e RSI: 0000000000000001 RDI: ffff88007c3d3c58 [ 2.575059] RBP: 00007fff70516510 R08: 0000000000000000 R09: 0000000000005c00 [ 2.575059] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c3d3c40 [ 2.575059] R13: ffff88007b634000 R14: 0000000000000000 R15: ffff880079ca26e0 [ 2.575059] FS: 00007f77c13d6700(0000) GS:ffff88007c200000(0000) knlGS:0000000000000000 [ 2.575059] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.575059] CR2: 00007fff70516418 CR3: 000000007ae81000 CR4: 00000000000006f0 [ 2.575059] Stack: [ 2.575059] ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff [ 2.575059] ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff [ 2.575059] ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff [ 2.575059] Call Trace: [ 2.575059] [ 2.575059] Code: 39 b5 80 03 00 00 0f 85 38 06 00 00 49 83 bf 88 02 00 00 00 0f 84 9a 03 00 00 49 8d 7c 24 18 48 c7 c2 2e 13 61 81 be 01 00 00 00 4c 4b a9 ff 48 8b 74 24 18 4c 89 ff 9c 55 48 89 f5 48 89 a7 [ 2.575059] Kernel panic - not syncing: Machine halted. [ 2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8 [ 2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 2.575059] ffff88007c205f18 ffff88007c205e88 ffffffff8160f461 ffffffff81817b42 [ 2.575059] ffff88007c205f08 ffffffff8160ded3 0000000000000008 ffff88007c205f18 [ 2.575059] ffff88007c205eb0 ffffffff81005cfb ffffffff81616531 0000000080000002 [ 2.575059] Call Trace: [ 2.575059] <#DF> [] dump_stack+0x4e/0x7a [ 2.575059] [] panic+0xc5/0x1e1 [ 2.575059] [] ? show_regs+0x5b/0x280 [ 2.575059] [] ? _raw_spin_unlock_irqrestore+0x41/0x90 [ 2.575059] [] df_debug+0x31/0x40 [ 2.575059] [] do_double_fault+0x5d/0x80 [ 2.575059] [] double_fault+0x27/0x30 [ 2.575059] [] ? __schedule+0x27e/0xab0 [ 2.575059] [] ? __schedule+0x28f/0xab0 [ 2.575059] <> [ 2.575059] Shutting down cpus with NMI [ 2.575059] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 2.575059] ---[ end Kernel panic - not syncing: Machine halted. diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c index 1abcb50b48ae..dd8e0eec071e 100644 --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -286,7 +286,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs, } if (i && ((i % STACKSLOTS_PER_LINE) == 0)) pr_cont("\n"); - pr_cont(" %016lx", *stack++); + pr_cont(" %016lx", (((unsigned long)stack <= 0x00007fffffffffffUL) ? -1 : *stack++)); touch_nmi_watchdog(); } preempt_enable();