From patchwork Wed Oct 26 22:21:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Linus Torvalds X-Patchwork-Id: 9398537 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E2E3160234 for ; Wed, 26 Oct 2016 22:25:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC42529E07 for ; Wed, 26 Oct 2016 22:25:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A0B6F29E21; Wed, 26 Oct 2016 22:25:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, T_TVD_MIME_EPI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D59029E07 for ; Wed, 26 Oct 2016 22:25:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935146AbcJZWXJ (ORCPT ); Wed, 26 Oct 2016 18:23:09 -0400 Received: from mail-oi0-f42.google.com ([209.85.218.42]:36583 "EHLO mail-oi0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935204AbcJZWWf (ORCPT ); Wed, 26 Oct 2016 18:22:35 -0400 Received: by mail-oi0-f42.google.com with SMTP id n202so7418071oig.3; Wed, 26 Oct 2016 15:22:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=n7oMmEyuBfZQCCSGHxhcHlHGvAyJ8fVBlfadTrU7CRY=; b=va+1Zl7jzmwmJlqynpIDYFoNVQg6pSwXeHUoZsclEpI/Wo2atsoFrfYQc2tQ9DJ6II IquUPwK1BDp+XAT51XKEKyOk4Vhexgt4zEqrhU9l5ghjns1akNwFnGxp8PHRqIdrbO8S Tt4QPDz85sP/VP4+ry+G7PQ4kNq0r7UK6dWp3a9afyqHy/sCi0Dt33skqPRqFR4FzK0Z Qgu7U92Iu0DRVAoJfQs6qRq1voOhowmcEPKf6pBN641TcVn7u2YTP1CQosmrcSGrfK9Z WmDYSwQDYU7fxmvQyC3+kKIELVmCC3+DtAMxZsFnw9rusGgxM6geSH/m6NzwxqiFIEkY WyZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=n7oMmEyuBfZQCCSGHxhcHlHGvAyJ8fVBlfadTrU7CRY=; b=VbHvKjgzAQJ6D4+AHP8sAXpD/3RXtNYZwa1xZ01wk4a1ViEFyMItJSF1aBwhSpW32p Wm7Oe8oselJvFK9qPbOJlStGbltLRoLmVHUjpB9Y5pCE8lcZax4eIBbFwNr2+6xbi8K9 JgSqwp41BzvkBi0SeUL0rKmpWNpoGSUaWoDT+tWFQXjh28g7VhKO+RDdupdOao4uHDed M5W6lb9eFC2Vv62U/x++kZMTkbjTJblGujAMpiDNRQc707YokJZENn1mwAj6RyTIL+19 HI3qqNrb6+0THmkA+QCk03tlC8Sv82GoP8CF/JOURgdtBKYNuGy7P9dh6tD6kxlXJvO6 6wdw== X-Gm-Message-State: ABUngvdIORaQ9GEJoRGfaGzfj20cwjw3CM6sIRDLzZmz0l2uafEcS5WOVFORS9BM84vWM3DJ4ie+Vdciz0MQXg== X-Received: by 10.157.34.137 with SMTP id y9mr3708666ota.108.1477520515029; Wed, 26 Oct 2016 15:21:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.142.104 with HTTP; Wed, 26 Oct 2016 15:21:53 -0700 (PDT) In-Reply-To: References: <20161021200245.kahjzgqzdfyoe3uz@codemonkey.org.uk> <20161022152033.gkmm3l75kqjzsije@codemonkey.org.uk> <20161024044051.onmh4h6sc2bjxzzc@codemonkey.org.uk> <77d9983d-a00a-1dc1-a9a1-631de1d0c146@fb.com> <20161026002752.qvrm6yxqb54fiqnd@codemonkey.org.uk> <20161026163018.wx57yy554576s6e2@codemonkey.org.uk> <20161026184201.6ofblkd3j5uxystq@codemonkey.org.uk> <488f9edc-6a1c-2c68-0d33-d3aa32ece9a4@fb.com> From: Linus Torvalds Date: Wed, 26 Oct 2016 15:21:53 -0700 X-Google-Sender-Auth: zKftQLhaew1JJzP2-yR3Ocl2f20 Message-ID: Subject: Re: bio linked list corruption. To: Chris Mason Cc: Dave Jones , Andy Lutomirski , Andy Lutomirski , Jens Axboe , Al Viro , Josef Bacik , David Sterba , linux-btrfs , Linux Kernel , Dave Chinner Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Oct 26, 2016 at 2:52 PM, Chris Mason wrote: > > This one is special because CONFIG_VMAP_STACK is not set. Btrfs triggers in < 10 minutes. > I've done 30 minutes each with XFS and Ext4 without luck. Ok, see the email I wrote that crossed yours - if it's really some list corruption on ctx->rq_list due to some locking problem, I really would expect CONFIG_VMAP_STACK to be entirely irrelevant, except perhaps from a timing standpoint. > WARNING: CPU: 6 PID: 4481 at lib/list_debug.c:33 __list_add+0xbe/0xd0 > list_add corruption. prev->next should be next (ffffe8ffffd80b08), but was ffff88012b65fb88. (prev=ffff880128c8d500). > Modules linked in: crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper i2c_piix4 cryptd i2c_core virtio_net serio_raw floppy button pcspkr sch_fq_codel autofs4 virtio_blk > CPU: 6 PID: 4481 Comm: dbench Not tainted 4.9.0-rc2-15419-g811d54d #319 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.0-1.fc24 04/01/2014 > ffff880104eff868 ffffffff814fde0f ffffffff8151c46e ffff880104eff8c8 > ffff880104eff8c8 0000000000000000 ffff880104eff8b8 ffffffff810648cf > ffff880128cab2c0 000000213fc57c68 ffff8801384e8928 ffff880128cab180 > Call Trace: > [] dump_stack+0x53/0x74 > [] ? __list_add+0xbe/0xd0 > [] __warn+0xff/0x120 > [] warn_slowpath_fmt+0x49/0x50 > [] __list_add+0xbe/0xd0 > [] blk_sq_make_request+0x388/0x580 > [] generic_make_request+0x104/0x200 Well, it's very consistent, I have to say. So I really don't think this is random corruption. Could you try the attached patch? It adds a couple of sanity tests: - a number of tests to verify that 'rq->queuelist' isn't already on some queue when it is added to a queue - one test to verify that rq->mq_ctx is the same ctx that we have locked. I may be completely full of shit, and this patch may be pure garbage or "obviously will never trigger", but humor me. Linus block/blk-mq.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index ddc2eed64771..4f575de7fdd0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -521,6 +521,8 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head) */ BUG_ON(rq->cmd_flags & REQ_SOFTBARRIER); +WARN_ON_ONCE(!list_empty(&rq->queuelist)); + spin_lock_irqsave(&q->requeue_lock, flags); if (at_head) { rq->cmd_flags |= REQ_SOFTBARRIER; @@ -838,6 +840,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) queued++; break; case BLK_MQ_RQ_QUEUE_BUSY: +WARN_ON_ONCE(!list_empty(&rq->queuelist)); list_add(&rq->queuelist, &rq_list); __blk_mq_requeue_request(rq); break; @@ -1034,6 +1037,8 @@ static inline void __blk_mq_insert_req_list(struct blk_mq_hw_ctx *hctx, trace_block_rq_insert(hctx->queue, rq); +WARN_ON_ONCE(!list_empty(&rq->queuelist)); + if (at_head) list_add(&rq->queuelist, &ctx->rq_list); else @@ -1137,6 +1142,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule) depth = 0; } +WARN_ON_ONCE(!list_empty(&rq->queuelist)); depth++; list_add_tail(&rq->queuelist, &ctx_list); } @@ -1172,6 +1178,7 @@ static inline bool blk_mq_merge_queue_io(struct blk_mq_hw_ctx *hctx, blk_mq_bio_to_request(rq, bio); spin_lock(&ctx->lock); insert_rq: +WARN_ON_ONCE(rq->mq_ctx != ctx); __blk_mq_insert_request(hctx, rq, false); spin_unlock(&ctx->lock); return false; @@ -1326,6 +1333,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) old_rq = same_queue_rq; list_del_init(&old_rq->queuelist); } +WARN_ON_ONCE(!list_empty(&rq->queuelist)); list_add_tail(&rq->queuelist, &plug->mq_list); } else /* is_sync */ old_rq = rq; @@ -1412,6 +1420,7 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio) trace_block_plug(q); } +WARN_ON_ONCE(!list_empty(&rq->queuelist)); list_add_tail(&rq->queuelist, &plug->mq_list); return cookie; }