From patchwork Mon Dec 5 17:21:35 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jones X-Patchwork-Id: 9461237 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DE43B6022E for ; Mon, 5 Dec 2016 17:22:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB5FC2434C for ; Mon, 5 Dec 2016 17:22:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BFDCD27DC2; Mon, 5 Dec 2016 17:22:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1AB152434C for ; Mon, 5 Dec 2016 17:22:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752090AbcLERVq (ORCPT ); Mon, 5 Dec 2016 12:21:46 -0500 Received: from arcturus.aphlor.org ([188.246.204.175]:51424 "EHLO arcturus.aphlor.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751223AbcLERVp (ORCPT ); Mon, 5 Dec 2016 12:21:45 -0500 Received: from c-65-96-119-39.hsd1.ma.comcast.net ([65.96.119.39] helo=wopr.kernelslacker.org) by arcturus.aphlor.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1cDwxN-0006iu-6I; Mon, 05 Dec 2016 17:21:37 +0000 Received: by wopr.kernelslacker.org (Postfix, from userid 1000) id 87D57A0; Mon, 5 Dec 2016 12:21:35 -0500 (EST) Date: Mon, 5 Dec 2016 12:21:35 -0500 From: Dave Jones To: Vegard Nossum Cc: Chris Mason , Linus Torvalds , Jens Axboe , Andy Lutomirski , Andy Lutomirski , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel , Dave Chinner Subject: Re: bio linked list corruption. Message-ID: <20161205172135.7tncgtcqhhgngmy4@codemonkey.org.uk> Mail-Followup-To: Dave Jones , Vegard Nossum , Chris Mason , Linus Torvalds , Jens Axboe , Andy Lutomirski , Andy Lutomirski , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel , Dave Chinner References: <20161026233808.GC15247@clm-mbp.thefacebook.com> <20161026234751.e66xyzjiwifvbuha@codemonkey.org.uk> <20161031185514.b22zvbxvga4xcinz@codemonkey.org.uk> <20161031194454.GA49877@clm-mbp.thefacebook.com> <20161123193419.pq7adje2eanky2wx@codemonkey.org.uk> <20161123195845.iphzr7ac4mu5ewjt@codemonkey.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20161104 (1.7.1) Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Dec 05, 2016 at 06:09:29PM +0100, Vegard Nossum wrote: > On 5 December 2016 at 12:10, Vegard Nossum wrote: > > On 5 December 2016 at 00:04, Vegard Nossum wrote: > >> FWIW I hit this as well: > >> > >> BUG: unable to handle kernel paging request at ffffffff81ff08b7 > >> IP: [] __lock_acquire.isra.32+0xda/0x1a30 > >> CPU: 0 PID: 21744 Comm: trinity-c56 Tainted: G B 4.9.0-rc7+ #217 > > [...] > > > >> I think you can rule out btrfs in any case, probably block layer as > >> well, since it looks like this comes from shmem. > > > > I should rather say that the VM runs on a 9p root filesystem and it > > doesn't use/mount any block devices or disk-based filesystems. > > > > I have all the trinity logs for the crash if that's useful. I tried a > > couple of runs with just the (at the time) in-progress syscalls but it > > didn't turn up anything interesting. Otherwise it seems like a lot of > > data to go through by hand. > > I've hit this another 7 times in the past ~3 hours. > > Three times the address being dereferenced has pointed to > iov_iter_init+0xaf (even across a kernel rebuild), three times it has > pointed to put_prev_entity+0x55, once to 0x800000008, and twice to > 0x292. The fact that it would hit even one of those more than once > across runs is pretty suspicious to me, although the ones that point > to iov_iter_init and put_prev_entity point to "random" instructions in > the sense that they are neither entry points nor return addresses. > > shmem_fault() was always on the stack, but it came from different > syscalls: add_key(), newuname(), pipe2(), newstat(), fstat(), > clock_settime(), mount(), etc. > ------------[ cut here ]------------ > The warning shows that it made it past the list_empty_careful() check > in finish_wait() but then bugs out on the &wait->task_list > dereference. I just pushed out the ftrace changes I made to Trinity that might help you gather more clues. Right now it's hardcoded to dump a trace to /boot/trace.txt when it detects the kernel has become tainted. Before a trinity run, I run this as root.. #!/bin/sh cd /sys/kernel/debug/tracing/ echo 10000 > buffer_size_kb echo function >> current_tracer for i in $(cat /home/davej/blacklist-symbols) do echo $i >> set_ftrace_notrace done echo 1 >> tracing_on blacklist-symbols is the more noisy stuff that pollutes traces. Right now I use these: https://paste.fedoraproject.org/499794/14809582/ You may need to add some more. (I'll get around to making all this scripting go away, and have trinity just set this stuff up itself eventually) Oh, and if you're not running as root, you might need a diff like below so that trinity can stop the trace when it detects tainting. Dave --- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 8696ce6bf2f6..2d6c97e871e0 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7217,7 +7217,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry *d_tracer) trace_create_file("trace_clock", 0644, d_tracer, tr, &trace_clock_fops); - trace_create_file("tracing_on", 0644, d_tracer, + trace_create_file("tracing_on", 0666, d_tracer, tr, &rb_simple_fops); create_trace_options_dir(tr);