Message ID | bf3c52f2-3150-1d0c-2ae3-740032707129@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 20, 2017 at 03:13:43PM -0600, Jens Axboe wrote: > We must have dropped the ctx before we call > blk_mq_sched_insert_request() with can_block=true, otherwise we risk > that a flush request can block on insertion if we are currently out of > tags. > > [ 47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002 > [ 47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre > [ 47.690572] Preemption disabled at: > [ 47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280 > [ 47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271 > [ 47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016 > [ 47.718081] Call Trace: > [ 47.720903] dump_stack+0x4f/0x73 > [ 47.724694] ? blk_mq_sched_get_request+0x6c/0x280 > [ 47.730137] __schedule_bug+0x6c/0xc0 > [ 47.734314] __schedule+0x559/0x780 > [ 47.738302] schedule+0x3b/0x90 > [ 47.741899] io_schedule+0x11/0x40 > [ 47.745788] blk_mq_get_tag+0x167/0x2a0 > [ 47.750162] ? remove_wait_queue+0x70/0x70 > [ 47.754901] blk_mq_get_driver_tag+0x92/0xf0 > [ 47.759758] blk_mq_sched_insert_request+0x134/0x170 > [ 47.765398] ? blk_account_io_start+0xd0/0x270 > [ 47.770679] blk_mq_make_request+0x1b2/0x850 > [ 47.775766] generic_make_request+0xf7/0x2d0 > [ 47.780860] submit_bio+0x5f/0x120 > [ 47.784979] ? submit_bio+0x5f/0x120 > [ 47.789631] submit_bh_wbc.isra.46+0x10d/0x130 > [ 47.794902] submit_bh+0xb/0x10 > [ 47.798719] journal_submit_commit_record+0x190/0x210 > [ 47.804686] ? _raw_spin_unlock+0x13/0x30 > [ 47.809480] jbd2_journal_commit_transaction+0x180a/0x1d00 > [ 47.815925] kjournald2+0xb6/0x250 > [ 47.820022] ? kjournald2+0xb6/0x250 > [ 47.824328] ? remove_wait_queue+0x70/0x70 > [ 47.829223] kthread+0x10e/0x140 > [ 47.833147] ? commit_timeout+0x10/0x10 > [ 47.837742] ? kthread_create_on_node+0x40/0x40 > [ 47.843122] ret_from_fork+0x29/0x40 > > Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request") > Signed-off-by: Jens Axboe <axboe@fb.com> > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 9d7645f24b05..323eed50d615 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1634,8 +1634,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) > blk_mq_try_issue_directly(data.hctx, rq, &cookie); > return cookie; > } else if (q->elevator) { > + blk_mq_put_ctx(data.ctx); > blk_mq_bio_to_request(rq, bio); > blk_mq_sched_insert_request(rq, false, true, true, true); > + return cookie; > } else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) > blk_mq_run_hw_queue(data.hctx, true); > > I'm confused, the first thing we check in make_request is: if (unlikely(is_flush_fua)) { blk_mq_bio_to_request(rq, bio); if (q->elevator) { blk_mq_sched_insert_request(rq, false, true, true, true); } else { blk_insert_flush(rq); blk_mq_run_hw_queue(data.hctx, true); } } and can_block doesn't do anything in the !flush case, so shouldn't it be changed in that one instead? Alternatively, blk_mq_get_tag() knows how to drop the ctx before it sleeps, but blk_mq_get_driver_tag() doesn't set data.ctx. I'm not sure if that'll do what we want, and this is making my head hurt.
On 04/20/2017 03:30 PM, Omar Sandoval wrote: > On Thu, Apr 20, 2017 at 03:13:43PM -0600, Jens Axboe wrote: >> We must have dropped the ctx before we call >> blk_mq_sched_insert_request() with can_block=true, otherwise we risk >> that a flush request can block on insertion if we are currently out of >> tags. >> >> [ 47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002 >> [ 47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre >> [ 47.690572] Preemption disabled at: >> [ 47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280 >> [ 47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271 >> [ 47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016 >> [ 47.718081] Call Trace: >> [ 47.720903] dump_stack+0x4f/0x73 >> [ 47.724694] ? blk_mq_sched_get_request+0x6c/0x280 >> [ 47.730137] __schedule_bug+0x6c/0xc0 >> [ 47.734314] __schedule+0x559/0x780 >> [ 47.738302] schedule+0x3b/0x90 >> [ 47.741899] io_schedule+0x11/0x40 >> [ 47.745788] blk_mq_get_tag+0x167/0x2a0 >> [ 47.750162] ? remove_wait_queue+0x70/0x70 >> [ 47.754901] blk_mq_get_driver_tag+0x92/0xf0 >> [ 47.759758] blk_mq_sched_insert_request+0x134/0x170 >> [ 47.765398] ? blk_account_io_start+0xd0/0x270 >> [ 47.770679] blk_mq_make_request+0x1b2/0x850 >> [ 47.775766] generic_make_request+0xf7/0x2d0 >> [ 47.780860] submit_bio+0x5f/0x120 >> [ 47.784979] ? submit_bio+0x5f/0x120 >> [ 47.789631] submit_bh_wbc.isra.46+0x10d/0x130 >> [ 47.794902] submit_bh+0xb/0x10 >> [ 47.798719] journal_submit_commit_record+0x190/0x210 >> [ 47.804686] ? _raw_spin_unlock+0x13/0x30 >> [ 47.809480] jbd2_journal_commit_transaction+0x180a/0x1d00 >> [ 47.815925] kjournald2+0xb6/0x250 >> [ 47.820022] ? kjournald2+0xb6/0x250 >> [ 47.824328] ? remove_wait_queue+0x70/0x70 >> [ 47.829223] kthread+0x10e/0x140 >> [ 47.833147] ? commit_timeout+0x10/0x10 >> [ 47.837742] ? kthread_create_on_node+0x40/0x40 >> [ 47.843122] ret_from_fork+0x29/0x40 >> >> Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request") >> Signed-off-by: Jens Axboe <axboe@fb.com> >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 9d7645f24b05..323eed50d615 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -1634,8 +1634,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) >> blk_mq_try_issue_directly(data.hctx, rq, &cookie); >> return cookie; >> } else if (q->elevator) { >> + blk_mq_put_ctx(data.ctx); >> blk_mq_bio_to_request(rq, bio); >> blk_mq_sched_insert_request(rq, false, true, true, true); >> + return cookie; >> } else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) >> blk_mq_run_hw_queue(data.hctx, true); >> >> > > I'm confused, the first thing we check in make_request is: > > if (unlikely(is_flush_fua)) { > blk_mq_bio_to_request(rq, bio); > if (q->elevator) { > blk_mq_sched_insert_request(rq, false, true, true, > true); > } else { > blk_insert_flush(rq); > blk_mq_run_hw_queue(data.hctx, true); > } > } > > and can_block doesn't do anything in the !flush case, so shouldn't it be > changed in that one instead? Just to get closure on this issue, the two cases ends up being folded into one. So we're really triggering the first case, but it's a jump to the 2nd one. Both cases should still be fixed up, the 2nd patch I sent out should be fine.
On Thu, Apr 20, 2017 at 04:39:10PM -0600, Jens Axboe wrote: > On 04/20/2017 03:30 PM, Omar Sandoval wrote: > > On Thu, Apr 20, 2017 at 03:13:43PM -0600, Jens Axboe wrote: > >> We must have dropped the ctx before we call > >> blk_mq_sched_insert_request() with can_block=true, otherwise we risk > >> that a flush request can block on insertion if we are currently out of > >> tags. > >> > >> [ 47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002 > >> [ 47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre > >> [ 47.690572] Preemption disabled at: > >> [ 47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280 > >> [ 47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271 > >> [ 47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016 > >> [ 47.718081] Call Trace: > >> [ 47.720903] dump_stack+0x4f/0x73 > >> [ 47.724694] ? blk_mq_sched_get_request+0x6c/0x280 > >> [ 47.730137] __schedule_bug+0x6c/0xc0 > >> [ 47.734314] __schedule+0x559/0x780 > >> [ 47.738302] schedule+0x3b/0x90 > >> [ 47.741899] io_schedule+0x11/0x40 > >> [ 47.745788] blk_mq_get_tag+0x167/0x2a0 > >> [ 47.750162] ? remove_wait_queue+0x70/0x70 > >> [ 47.754901] blk_mq_get_driver_tag+0x92/0xf0 > >> [ 47.759758] blk_mq_sched_insert_request+0x134/0x170 > >> [ 47.765398] ? blk_account_io_start+0xd0/0x270 > >> [ 47.770679] blk_mq_make_request+0x1b2/0x850 > >> [ 47.775766] generic_make_request+0xf7/0x2d0 > >> [ 47.780860] submit_bio+0x5f/0x120 > >> [ 47.784979] ? submit_bio+0x5f/0x120 > >> [ 47.789631] submit_bh_wbc.isra.46+0x10d/0x130 > >> [ 47.794902] submit_bh+0xb/0x10 > >> [ 47.798719] journal_submit_commit_record+0x190/0x210 > >> [ 47.804686] ? _raw_spin_unlock+0x13/0x30 > >> [ 47.809480] jbd2_journal_commit_transaction+0x180a/0x1d00 > >> [ 47.815925] kjournald2+0xb6/0x250 > >> [ 47.820022] ? kjournald2+0xb6/0x250 > >> [ 47.824328] ? remove_wait_queue+0x70/0x70 > >> [ 47.829223] kthread+0x10e/0x140 > >> [ 47.833147] ? commit_timeout+0x10/0x10 > >> [ 47.837742] ? kthread_create_on_node+0x40/0x40 > >> [ 47.843122] ret_from_fork+0x29/0x40 > >> > >> Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request") > >> Signed-off-by: Jens Axboe <axboe@fb.com> > >> > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > >> index 9d7645f24b05..323eed50d615 100644 > >> --- a/block/blk-mq.c > >> +++ b/block/blk-mq.c > >> @@ -1634,8 +1634,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) > >> blk_mq_try_issue_directly(data.hctx, rq, &cookie); > >> return cookie; > >> } else if (q->elevator) { > >> + blk_mq_put_ctx(data.ctx); > >> blk_mq_bio_to_request(rq, bio); > >> blk_mq_sched_insert_request(rq, false, true, true, true); > >> + return cookie; > >> } else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) > >> blk_mq_run_hw_queue(data.hctx, true); > >> > >> > > > > I'm confused, the first thing we check in make_request is: > > > > if (unlikely(is_flush_fua)) { > > blk_mq_bio_to_request(rq, bio); > > if (q->elevator) { > > blk_mq_sched_insert_request(rq, false, true, true, > > true); > > } else { > > blk_insert_flush(rq); > > blk_mq_run_hw_queue(data.hctx, true); > > } > > } > > > > and can_block doesn't do anything in the !flush case, so shouldn't it be > > changed in that one instead? > > Just to get closure on this issue, the two cases ends up being folded > into one. So we're really triggering the first case, but it's a jump > to the 2nd one. > > Both cases should still be fixed up, the 2nd patch I sent out should be > fine. You can add Reviewed-by: Omar Sandoval <osandov@fb.com> for the 2nd patch.
On 04/20/2017 04:41 PM, Omar Sandoval wrote: > On Thu, Apr 20, 2017 at 04:39:10PM -0600, Jens Axboe wrote: >> On 04/20/2017 03:30 PM, Omar Sandoval wrote: >>> On Thu, Apr 20, 2017 at 03:13:43PM -0600, Jens Axboe wrote: >>>> We must have dropped the ctx before we call >>>> blk_mq_sched_insert_request() with can_block=true, otherwise we risk >>>> that a flush request can block on insertion if we are currently out of >>>> tags. >>>> >>>> [ 47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002 >>>> [ 47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre >>>> [ 47.690572] Preemption disabled at: >>>> [ 47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280 >>>> [ 47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271 >>>> [ 47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016 >>>> [ 47.718081] Call Trace: >>>> [ 47.720903] dump_stack+0x4f/0x73 >>>> [ 47.724694] ? blk_mq_sched_get_request+0x6c/0x280 >>>> [ 47.730137] __schedule_bug+0x6c/0xc0 >>>> [ 47.734314] __schedule+0x559/0x780 >>>> [ 47.738302] schedule+0x3b/0x90 >>>> [ 47.741899] io_schedule+0x11/0x40 >>>> [ 47.745788] blk_mq_get_tag+0x167/0x2a0 >>>> [ 47.750162] ? remove_wait_queue+0x70/0x70 >>>> [ 47.754901] blk_mq_get_driver_tag+0x92/0xf0 >>>> [ 47.759758] blk_mq_sched_insert_request+0x134/0x170 >>>> [ 47.765398] ? blk_account_io_start+0xd0/0x270 >>>> [ 47.770679] blk_mq_make_request+0x1b2/0x850 >>>> [ 47.775766] generic_make_request+0xf7/0x2d0 >>>> [ 47.780860] submit_bio+0x5f/0x120 >>>> [ 47.784979] ? submit_bio+0x5f/0x120 >>>> [ 47.789631] submit_bh_wbc.isra.46+0x10d/0x130 >>>> [ 47.794902] submit_bh+0xb/0x10 >>>> [ 47.798719] journal_submit_commit_record+0x190/0x210 >>>> [ 47.804686] ? _raw_spin_unlock+0x13/0x30 >>>> [ 47.809480] jbd2_journal_commit_transaction+0x180a/0x1d00 >>>> [ 47.815925] kjournald2+0xb6/0x250 >>>> [ 47.820022] ? kjournald2+0xb6/0x250 >>>> [ 47.824328] ? remove_wait_queue+0x70/0x70 >>>> [ 47.829223] kthread+0x10e/0x140 >>>> [ 47.833147] ? commit_timeout+0x10/0x10 >>>> [ 47.837742] ? kthread_create_on_node+0x40/0x40 >>>> [ 47.843122] ret_from_fork+0x29/0x40 >>>> >>>> Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request") >>>> Signed-off-by: Jens Axboe <axboe@fb.com> >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 9d7645f24b05..323eed50d615 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -1634,8 +1634,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) >>>> blk_mq_try_issue_directly(data.hctx, rq, &cookie); >>>> return cookie; >>>> } else if (q->elevator) { >>>> + blk_mq_put_ctx(data.ctx); >>>> blk_mq_bio_to_request(rq, bio); >>>> blk_mq_sched_insert_request(rq, false, true, true, true); >>>> + return cookie; >>>> } else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) >>>> blk_mq_run_hw_queue(data.hctx, true); >>>> >>>> >>> >>> I'm confused, the first thing we check in make_request is: >>> >>> if (unlikely(is_flush_fua)) { >>> blk_mq_bio_to_request(rq, bio); >>> if (q->elevator) { >>> blk_mq_sched_insert_request(rq, false, true, true, >>> true); >>> } else { >>> blk_insert_flush(rq); >>> blk_mq_run_hw_queue(data.hctx, true); >>> } >>> } >>> >>> and can_block doesn't do anything in the !flush case, so shouldn't it be >>> changed in that one instead? >> >> Just to get closure on this issue, the two cases ends up being folded >> into one. So we're really triggering the first case, but it's a jump >> to the 2nd one. >> >> Both cases should still be fixed up, the 2nd patch I sent out should be >> fine. > > You can add > > Reviewed-by: Omar Sandoval <osandov@fb.com> > > for the 2nd patch. Thanks Omar, queued up for 4.12.
diff --git a/block/blk-mq.c b/block/blk-mq.c index 9d7645f24b05..323eed50d615 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1634,8 +1634,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) blk_mq_try_issue_directly(data.hctx, rq, &cookie); return cookie; } else if (q->elevator) { + blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); blk_mq_sched_insert_request(rq, false, true, true, true); + return cookie; } else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) blk_mq_run_hw_queue(data.hctx, true);
We must have dropped the ctx before we call blk_mq_sched_insert_request() with can_block=true, otherwise we risk that a flush request can block on insertion if we are currently out of tags. [ 47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002 [ 47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre [ 47.690572] Preemption disabled at: [ 47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280 [ 47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271 [ 47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016 [ 47.718081] Call Trace: [ 47.720903] dump_stack+0x4f/0x73 [ 47.724694] ? blk_mq_sched_get_request+0x6c/0x280 [ 47.730137] __schedule_bug+0x6c/0xc0 [ 47.734314] __schedule+0x559/0x780 [ 47.738302] schedule+0x3b/0x90 [ 47.741899] io_schedule+0x11/0x40 [ 47.745788] blk_mq_get_tag+0x167/0x2a0 [ 47.750162] ? remove_wait_queue+0x70/0x70 [ 47.754901] blk_mq_get_driver_tag+0x92/0xf0 [ 47.759758] blk_mq_sched_insert_request+0x134/0x170 [ 47.765398] ? blk_account_io_start+0xd0/0x270 [ 47.770679] blk_mq_make_request+0x1b2/0x850 [ 47.775766] generic_make_request+0xf7/0x2d0 [ 47.780860] submit_bio+0x5f/0x120 [ 47.784979] ? submit_bio+0x5f/0x120 [ 47.789631] submit_bh_wbc.isra.46+0x10d/0x130 [ 47.794902] submit_bh+0xb/0x10 [ 47.798719] journal_submit_commit_record+0x190/0x210 [ 47.804686] ? _raw_spin_unlock+0x13/0x30 [ 47.809480] jbd2_journal_commit_transaction+0x180a/0x1d00 [ 47.815925] kjournald2+0xb6/0x250 [ 47.820022] ? kjournald2+0xb6/0x250 [ 47.824328] ? remove_wait_queue+0x70/0x70 [ 47.829223] kthread+0x10e/0x140 [ 47.833147] ? commit_timeout+0x10/0x10 [ 47.837742] ? kthread_create_on_node+0x40/0x40 [ 47.843122] ret_from_fork+0x29/0x40 Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request") Signed-off-by: Jens Axboe <axboe@fb.com>