Message ID | CA+55aFwY5jNcUAJzDWERP2r9iZHEKzHSwMi5AJCJiaVd3Z0E-g@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Oct 26, 2016 at 03:21:53PM -0700, Linus Torvalds wrote: > Could you try the attached patch? It adds a couple of sanity tests: > > - a number of tests to verify that 'rq->queuelist' isn't already on > some queue when it is added to a queue > > - one test to verify that rq->mq_ctx is the same ctx that we have locked. > > I may be completely full of shit, and this patch may be pure garbage > or "obviously will never trigger", but humor me. I gave it a shot too for shits & giggles. This falls out during boot. [ 9.244030] EXT4-fs (sda4): mounted filesystem with ordered data mode. Opts: (null) [ 9.271391] ------------[ cut here ]------------ [ 9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181 blk_sq_make_request+0x465/0x4a0 [ 9.285613] CPU: 0 PID: 1 Comm: init Not tainted 4.9.0-rc2-think+ #4 [ 9.300106] ffffc90000013848 [ 9.307353] ffffffff8130d27c [ 9.307420] 0000000000000000 [ 9.307456] 0000000000000000 [ 9.307494] ffffc90000013888 [ 9.314780] ffffffff81077a41 [ 9.314847] 0000049d00013898 [ 9.314884] 0000000000000001 [ 9.314922] 0000000000000002 [ 9.322213] ffffe8ffffc06600 [ 9.322280] ffffe8ffff606600 [ 9.322317] ffff8805015c2e80 [ 9.322355] Call Trace: [ 9.329689] [<ffffffff8130d27c>] dump_stack+0x4f/0x73 [ 9.337208] [<ffffffff81077a41>] __warn+0xc1/0xe0 [ 9.344730] [<ffffffff81077b18>] warn_slowpath_null+0x18/0x20 [ 9.352282] [<ffffffff812f7975>] blk_sq_make_request+0x465/0x4a0 [ 9.359816] [<ffffffff812eb44a>] ? generic_make_request+0xca/0x210 [ 9.367367] [<ffffffff812eb457>] generic_make_request+0xd7/0x210 [ 9.374931] [<ffffffff812eb5f9>] submit_bio+0x69/0x120 [ 9.382488] [<ffffffff812004ef>] ? submit_bh_wbc+0x16f/0x1e0 [ 9.390083] [<ffffffff812004ef>] submit_bh_wbc+0x16f/0x1e0 [ 9.397673] [<ffffffff81200bfe>] submit_bh+0xe/0x10 [ 9.405205] [<ffffffff81255eac>] __ext4_get_inode_loc+0x1ac/0x3d0 [ 9.412795] [<ffffffff81258f4b>] ext4_iget+0x6b/0xb70 [ 9.420366] [<ffffffff811d6018>] ? lookup_slow+0xf8/0x1f0 [ 9.427947] [<ffffffff81259a7a>] ext4_iget_normal+0x2a/0x30 [ 9.435541] [<ffffffff81264734>] ext4_lookup+0x114/0x230 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 26, 2016 at 3:40 PM, Dave Jones <davej@codemonkey.org.uk> wrote: > > I gave it a shot too for shits & giggles. > This falls out during boot. > > [ 9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181 blk_sq_make_request+0x465/0x4a0 Hmm. That's the WARN_ON_ONCE(rq->mq_ctx != ctx); that I added to blk_mq_merge_queue_io(), and I really think that warning is valid, and the fact that it triggers shows that something is wrong with locking. We just did a spin_lock(&ctx->lock); and that lock is *supposed* to protect the __blk_mq_insert_request(), but that uses rq->mq_ctx. So if rq->mq_ctx != ctx, then we're locking the wrong context. Jens - please explain to me why I'm wrong. Or maybe I actually might have found the problem? In which case please send me a patch that fixes it ;) Dave: it might be a good idea to split that "WARN_ON_ONCE()" in blk_mq_merge_queue_io() into two, since right now it can trigger both for the blk_mq_bio_to_request(rq, bio); path _and_ for the if (!blk_mq_attempt_merge(q, ctx, bio)) { blk_mq_bio_to_request(rq, bio); goto insert_rq; path. If you split it into two: one before that "insert_rq:" label, and one before the "goto insert_rq" thing, then we could see if it is just one of the blk_mq_merge_queue_io() cases (or both) that is broken.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/26/2016 04:51 PM, Linus Torvalds wrote: > On Wed, Oct 26, 2016 at 3:40 PM, Dave Jones <davej@codemonkey.org.uk> wrote: >> >> I gave it a shot too for shits & giggles. >> This falls out during boot. >> >> [ 9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181 blk_sq_make_request+0x465/0x4a0 > > Hmm. That's the > > WARN_ON_ONCE(rq->mq_ctx != ctx); > > that I added to blk_mq_merge_queue_io(), and I really think that > warning is valid, and the fact that it triggers shows that something > is wrong with locking. > > We just did a > > spin_lock(&ctx->lock); > > and that lock is *supposed* to protect the __blk_mq_insert_request(), > but that uses rq->mq_ctx. > > So if rq->mq_ctx != ctx, then we're locking the wrong context. > > Jens - please explain to me why I'm wrong. > > Or maybe I actually might have found the problem? In which case please > send me a patch that fixes it ;) I think you're pretty close, the two should not be different and I don't immediately see how. I'll run some testing here, should be easier with this knowledge. > Dave: it might be a good idea to split that "WARN_ON_ONCE()" in > blk_mq_merge_queue_io() into two, since right now it can trigger both > for the > > blk_mq_bio_to_request(rq, bio); > > path _and_ for the > > if (!blk_mq_attempt_merge(q, ctx, bio)) { > blk_mq_bio_to_request(rq, bio); > goto insert_rq; And just in case I can't trigger, would be interesting to add a call to blk_rq_dump_flags() as well, in case this is some special request.
On Wed, Oct 26, 2016 at 3:51 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Dave: it might be a good idea to split that "WARN_ON_ONCE()" in > blk_mq_merge_queue_io() into two I did that myself too, since Dave sees this during boot. But I'm not getting the warning ;( Dave gets it with ext4, and thats' what I have too, so I'm not sure what the required trigger would be. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 26, 2016 at 03:51:01PM -0700, Linus Torvalds wrote: > Dave: it might be a good idea to split that "WARN_ON_ONCE()" in > blk_mq_merge_queue_io() into two, since right now it can trigger both > for the > > blk_mq_bio_to_request(rq, bio); > > path _and_ for the > > if (!blk_mq_attempt_merge(q, ctx, bio)) { > blk_mq_bio_to_request(rq, bio); > goto insert_rq; > > path. If you split it into two: one before that "insert_rq:" label, > and one before the "goto insert_rq" thing, then we could see if it is > just one of the blk_mq_merge_queue_io() cases (or both) that is > broken.. It's the latter of the two. [ 12.302392] WARNING: CPU: 3 PID: 272 at block/blk-mq.c:1191 blk_sq_make_request+0x320/0x4d0 Dave -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/block/blk-mq.c b/block/blk-mq.c index ddc2eed64771..4f575de7fdd0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -521,6 +521,8 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head) */ BUG_ON(rq->cmd_flags & REQ_SOFTBARRIER); +WARN_ON_ONCE(!list_empty(&rq->queuelist)); + spin_lock_irqsave(&q->requeue_lock, flags); if (at_head) { rq->cmd_flags |= REQ_SOFTBARRIER; @@ -838,6 +840,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx) queued++; break; case BLK_MQ_RQ_QUEUE_BUSY: +WARN_ON_ONCE(!list_empty(&rq->queuelist)); list_add(&rq->queuelist, &rq_list); __blk_mq_requeue_request(rq); break; @@ -1034,6 +1037,8 @@ static inline void __blk_mq_insert_req_list(struct blk_mq_hw_ctx *hctx, trace_block_rq_insert(hctx->queue, rq); +WARN_ON_ONCE(!list_empty(&rq->queuelist)); + if (at_head) list_add(&rq->queuelist, &ctx->rq_list); else @@ -1137,6 +1142,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule) depth = 0; } +WARN_ON_ONCE(!list_empty(&rq->queuelist)); depth++; list_add_tail(&rq->queuelist, &ctx_list); } @@ -1172,6 +1178,7 @@ static inline bool blk_mq_merge_queue_io(struct blk_mq_hw_ctx *hctx, blk_mq_bio_to_request(rq, bio); spin_lock(&ctx->lock); insert_rq: +WARN_ON_ONCE(rq->mq_ctx != ctx); __blk_mq_insert_request(hctx, rq, false); spin_unlock(&ctx->lock); return false; @@ -1326,6 +1333,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) old_rq = same_queue_rq; list_del_init(&old_rq->queuelist); } +WARN_ON_ONCE(!list_empty(&rq->queuelist)); list_add_tail(&rq->queuelist, &plug->mq_list); } else /* is_sync */ old_rq = rq; @@ -1412,6 +1420,7 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio) trace_block_plug(q); } +WARN_ON_ONCE(!list_empty(&rq->queuelist)); list_add_tail(&rq->queuelist, &plug->mq_list); return cookie; }