diff mbox

blk-mq: fix schedule-while-atomic with scheduler attached

Message ID bf3c52f2-3150-1d0c-2ae3-740032707129@kernel.dk (mailing list archive)
State New, archived
Headers show

Commit Message

Jens Axboe April 20, 2017, 9:13 p.m. UTC
We must have dropped the ctx before we call
blk_mq_sched_insert_request() with can_block=true, otherwise we risk
that a flush request can block on insertion if we are currently out of
tags.

[   47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002
[   47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre
[   47.690572] Preemption disabled at:
[   47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280
[   47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271
[   47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
[   47.718081] Call Trace:
[   47.720903]  dump_stack+0x4f/0x73
[   47.724694]  ? blk_mq_sched_get_request+0x6c/0x280
[   47.730137]  __schedule_bug+0x6c/0xc0
[   47.734314]  __schedule+0x559/0x780
[   47.738302]  schedule+0x3b/0x90
[   47.741899]  io_schedule+0x11/0x40
[   47.745788]  blk_mq_get_tag+0x167/0x2a0
[   47.750162]  ? remove_wait_queue+0x70/0x70
[   47.754901]  blk_mq_get_driver_tag+0x92/0xf0
[   47.759758]  blk_mq_sched_insert_request+0x134/0x170
[   47.765398]  ? blk_account_io_start+0xd0/0x270
[   47.770679]  blk_mq_make_request+0x1b2/0x850
[   47.775766]  generic_make_request+0xf7/0x2d0
[   47.780860]  submit_bio+0x5f/0x120
[   47.784979]  ? submit_bio+0x5f/0x120
[   47.789631]  submit_bh_wbc.isra.46+0x10d/0x130
[   47.794902]  submit_bh+0xb/0x10
[   47.798719]  journal_submit_commit_record+0x190/0x210
[   47.804686]  ? _raw_spin_unlock+0x13/0x30
[   47.809480]  jbd2_journal_commit_transaction+0x180a/0x1d00
[   47.815925]  kjournald2+0xb6/0x250
[   47.820022]  ? kjournald2+0xb6/0x250
[   47.824328]  ? remove_wait_queue+0x70/0x70
[   47.829223]  kthread+0x10e/0x140
[   47.833147]  ? commit_timeout+0x10/0x10
[   47.837742]  ? kthread_create_on_node+0x40/0x40
[   47.843122]  ret_from_fork+0x29/0x40

Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request")
Signed-off-by: Jens Axboe <axboe@fb.com>

Comments

Omar Sandoval April 20, 2017, 9:30 p.m. UTC | #1
On Thu, Apr 20, 2017 at 03:13:43PM -0600, Jens Axboe wrote:
> We must have dropped the ctx before we call
> blk_mq_sched_insert_request() with can_block=true, otherwise we risk
> that a flush request can block on insertion if we are currently out of
> tags.
> 
> [   47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002
> [   47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre
> [   47.690572] Preemption disabled at:
> [   47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280
> [   47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271
> [   47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
> [   47.718081] Call Trace:
> [   47.720903]  dump_stack+0x4f/0x73
> [   47.724694]  ? blk_mq_sched_get_request+0x6c/0x280
> [   47.730137]  __schedule_bug+0x6c/0xc0
> [   47.734314]  __schedule+0x559/0x780
> [   47.738302]  schedule+0x3b/0x90
> [   47.741899]  io_schedule+0x11/0x40
> [   47.745788]  blk_mq_get_tag+0x167/0x2a0
> [   47.750162]  ? remove_wait_queue+0x70/0x70
> [   47.754901]  blk_mq_get_driver_tag+0x92/0xf0
> [   47.759758]  blk_mq_sched_insert_request+0x134/0x170
> [   47.765398]  ? blk_account_io_start+0xd0/0x270
> [   47.770679]  blk_mq_make_request+0x1b2/0x850
> [   47.775766]  generic_make_request+0xf7/0x2d0
> [   47.780860]  submit_bio+0x5f/0x120
> [   47.784979]  ? submit_bio+0x5f/0x120
> [   47.789631]  submit_bh_wbc.isra.46+0x10d/0x130
> [   47.794902]  submit_bh+0xb/0x10
> [   47.798719]  journal_submit_commit_record+0x190/0x210
> [   47.804686]  ? _raw_spin_unlock+0x13/0x30
> [   47.809480]  jbd2_journal_commit_transaction+0x180a/0x1d00
> [   47.815925]  kjournald2+0xb6/0x250
> [   47.820022]  ? kjournald2+0xb6/0x250
> [   47.824328]  ? remove_wait_queue+0x70/0x70
> [   47.829223]  kthread+0x10e/0x140
> [   47.833147]  ? commit_timeout+0x10/0x10
> [   47.837742]  ? kthread_create_on_node+0x40/0x40
> [   47.843122]  ret_from_fork+0x29/0x40
> 
> Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request")
> Signed-off-by: Jens Axboe <axboe@fb.com>
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9d7645f24b05..323eed50d615 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1634,8 +1634,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  		blk_mq_try_issue_directly(data.hctx, rq, &cookie);
>  		return cookie;
>  	} else if (q->elevator) {
> +		blk_mq_put_ctx(data.ctx);
>  		blk_mq_bio_to_request(rq, bio);
>  		blk_mq_sched_insert_request(rq, false, true, true, true);
> +		return cookie;
>  	} else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio))
>  		blk_mq_run_hw_queue(data.hctx, true);
>  
> 

I'm confused, the first thing we check in make_request is:

	if (unlikely(is_flush_fua)) {
		blk_mq_bio_to_request(rq, bio);
		if (q->elevator) {
			blk_mq_sched_insert_request(rq, false, true, true,
					true);
		} else {
			blk_insert_flush(rq);
			blk_mq_run_hw_queue(data.hctx, true);
		}
	}

and can_block doesn't do anything in the !flush case, so shouldn't it be
changed in that one instead?

Alternatively, blk_mq_get_tag() knows how to drop the ctx before it
sleeps, but blk_mq_get_driver_tag() doesn't set data.ctx. I'm not sure
if that'll do what we want, and this is making my head hurt.
Jens Axboe April 20, 2017, 10:39 p.m. UTC | #2
On 04/20/2017 03:30 PM, Omar Sandoval wrote:
> On Thu, Apr 20, 2017 at 03:13:43PM -0600, Jens Axboe wrote:
>> We must have dropped the ctx before we call
>> blk_mq_sched_insert_request() with can_block=true, otherwise we risk
>> that a flush request can block on insertion if we are currently out of
>> tags.
>>
>> [   47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002
>> [   47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre
>> [   47.690572] Preemption disabled at:
>> [   47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280
>> [   47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271
>> [   47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
>> [   47.718081] Call Trace:
>> [   47.720903]  dump_stack+0x4f/0x73
>> [   47.724694]  ? blk_mq_sched_get_request+0x6c/0x280
>> [   47.730137]  __schedule_bug+0x6c/0xc0
>> [   47.734314]  __schedule+0x559/0x780
>> [   47.738302]  schedule+0x3b/0x90
>> [   47.741899]  io_schedule+0x11/0x40
>> [   47.745788]  blk_mq_get_tag+0x167/0x2a0
>> [   47.750162]  ? remove_wait_queue+0x70/0x70
>> [   47.754901]  blk_mq_get_driver_tag+0x92/0xf0
>> [   47.759758]  blk_mq_sched_insert_request+0x134/0x170
>> [   47.765398]  ? blk_account_io_start+0xd0/0x270
>> [   47.770679]  blk_mq_make_request+0x1b2/0x850
>> [   47.775766]  generic_make_request+0xf7/0x2d0
>> [   47.780860]  submit_bio+0x5f/0x120
>> [   47.784979]  ? submit_bio+0x5f/0x120
>> [   47.789631]  submit_bh_wbc.isra.46+0x10d/0x130
>> [   47.794902]  submit_bh+0xb/0x10
>> [   47.798719]  journal_submit_commit_record+0x190/0x210
>> [   47.804686]  ? _raw_spin_unlock+0x13/0x30
>> [   47.809480]  jbd2_journal_commit_transaction+0x180a/0x1d00
>> [   47.815925]  kjournald2+0xb6/0x250
>> [   47.820022]  ? kjournald2+0xb6/0x250
>> [   47.824328]  ? remove_wait_queue+0x70/0x70
>> [   47.829223]  kthread+0x10e/0x140
>> [   47.833147]  ? commit_timeout+0x10/0x10
>> [   47.837742]  ? kthread_create_on_node+0x40/0x40
>> [   47.843122]  ret_from_fork+0x29/0x40
>>
>> Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request")
>> Signed-off-by: Jens Axboe <axboe@fb.com>
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 9d7645f24b05..323eed50d615 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -1634,8 +1634,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>  		blk_mq_try_issue_directly(data.hctx, rq, &cookie);
>>  		return cookie;
>>  	} else if (q->elevator) {
>> +		blk_mq_put_ctx(data.ctx);
>>  		blk_mq_bio_to_request(rq, bio);
>>  		blk_mq_sched_insert_request(rq, false, true, true, true);
>> +		return cookie;
>>  	} else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio))
>>  		blk_mq_run_hw_queue(data.hctx, true);
>>  
>>
> 
> I'm confused, the first thing we check in make_request is:
> 
> 	if (unlikely(is_flush_fua)) {
> 		blk_mq_bio_to_request(rq, bio);
> 		if (q->elevator) {
> 			blk_mq_sched_insert_request(rq, false, true, true,
> 					true);
> 		} else {
> 			blk_insert_flush(rq);
> 			blk_mq_run_hw_queue(data.hctx, true);
> 		}
> 	}
> 
> and can_block doesn't do anything in the !flush case, so shouldn't it be
> changed in that one instead?

Just to get closure on this issue, the two cases ends up being folded
into one. So we're really triggering the first case, but it's a jump
to the 2nd one.

Both cases should still be fixed up, the 2nd patch I sent out should be
fine.
Omar Sandoval April 20, 2017, 10:41 p.m. UTC | #3
On Thu, Apr 20, 2017 at 04:39:10PM -0600, Jens Axboe wrote:
> On 04/20/2017 03:30 PM, Omar Sandoval wrote:
> > On Thu, Apr 20, 2017 at 03:13:43PM -0600, Jens Axboe wrote:
> >> We must have dropped the ctx before we call
> >> blk_mq_sched_insert_request() with can_block=true, otherwise we risk
> >> that a flush request can block on insertion if we are currently out of
> >> tags.
> >>
> >> [   47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002
> >> [   47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre
> >> [   47.690572] Preemption disabled at:
> >> [   47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280
> >> [   47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271
> >> [   47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
> >> [   47.718081] Call Trace:
> >> [   47.720903]  dump_stack+0x4f/0x73
> >> [   47.724694]  ? blk_mq_sched_get_request+0x6c/0x280
> >> [   47.730137]  __schedule_bug+0x6c/0xc0
> >> [   47.734314]  __schedule+0x559/0x780
> >> [   47.738302]  schedule+0x3b/0x90
> >> [   47.741899]  io_schedule+0x11/0x40
> >> [   47.745788]  blk_mq_get_tag+0x167/0x2a0
> >> [   47.750162]  ? remove_wait_queue+0x70/0x70
> >> [   47.754901]  blk_mq_get_driver_tag+0x92/0xf0
> >> [   47.759758]  blk_mq_sched_insert_request+0x134/0x170
> >> [   47.765398]  ? blk_account_io_start+0xd0/0x270
> >> [   47.770679]  blk_mq_make_request+0x1b2/0x850
> >> [   47.775766]  generic_make_request+0xf7/0x2d0
> >> [   47.780860]  submit_bio+0x5f/0x120
> >> [   47.784979]  ? submit_bio+0x5f/0x120
> >> [   47.789631]  submit_bh_wbc.isra.46+0x10d/0x130
> >> [   47.794902]  submit_bh+0xb/0x10
> >> [   47.798719]  journal_submit_commit_record+0x190/0x210
> >> [   47.804686]  ? _raw_spin_unlock+0x13/0x30
> >> [   47.809480]  jbd2_journal_commit_transaction+0x180a/0x1d00
> >> [   47.815925]  kjournald2+0xb6/0x250
> >> [   47.820022]  ? kjournald2+0xb6/0x250
> >> [   47.824328]  ? remove_wait_queue+0x70/0x70
> >> [   47.829223]  kthread+0x10e/0x140
> >> [   47.833147]  ? commit_timeout+0x10/0x10
> >> [   47.837742]  ? kthread_create_on_node+0x40/0x40
> >> [   47.843122]  ret_from_fork+0x29/0x40
> >>
> >> Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request")
> >> Signed-off-by: Jens Axboe <axboe@fb.com>
> >>
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index 9d7645f24b05..323eed50d615 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -1634,8 +1634,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
> >>  		blk_mq_try_issue_directly(data.hctx, rq, &cookie);
> >>  		return cookie;
> >>  	} else if (q->elevator) {
> >> +		blk_mq_put_ctx(data.ctx);
> >>  		blk_mq_bio_to_request(rq, bio);
> >>  		blk_mq_sched_insert_request(rq, false, true, true, true);
> >> +		return cookie;
> >>  	} else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio))
> >>  		blk_mq_run_hw_queue(data.hctx, true);
> >>  
> >>
> > 
> > I'm confused, the first thing we check in make_request is:
> > 
> > 	if (unlikely(is_flush_fua)) {
> > 		blk_mq_bio_to_request(rq, bio);
> > 		if (q->elevator) {
> > 			blk_mq_sched_insert_request(rq, false, true, true,
> > 					true);
> > 		} else {
> > 			blk_insert_flush(rq);
> > 			blk_mq_run_hw_queue(data.hctx, true);
> > 		}
> > 	}
> > 
> > and can_block doesn't do anything in the !flush case, so shouldn't it be
> > changed in that one instead?
> 
> Just to get closure on this issue, the two cases ends up being folded
> into one. So we're really triggering the first case, but it's a jump
> to the 2nd one.
> 
> Both cases should still be fixed up, the 2nd patch I sent out should be
> fine.

You can add

Reviewed-by: Omar Sandoval <osandov@fb.com>

for the 2nd patch.
Jens Axboe April 20, 2017, 10:42 p.m. UTC | #4
On 04/20/2017 04:41 PM, Omar Sandoval wrote:
> On Thu, Apr 20, 2017 at 04:39:10PM -0600, Jens Axboe wrote:
>> On 04/20/2017 03:30 PM, Omar Sandoval wrote:
>>> On Thu, Apr 20, 2017 at 03:13:43PM -0600, Jens Axboe wrote:
>>>> We must have dropped the ctx before we call
>>>> blk_mq_sched_insert_request() with can_block=true, otherwise we risk
>>>> that a flush request can block on insertion if we are currently out of
>>>> tags.
>>>>
>>>> [   47.667190] BUG: scheduling while atomic: jbd2/sda2-8/2089/0x00000002
>>>> [   47.674493] Modules linked in: x86_pkg_temp_thermal btrfs xor zlib_deflate raid6_pq sr_mod cdre
>>>> [   47.690572] Preemption disabled at:
>>>> [   47.690584] [<ffffffff81326c7c>] blk_mq_sched_get_request+0x6c/0x280
>>>> [   47.701764] CPU: 1 PID: 2089 Comm: jbd2/sda2-8 Not tainted 4.11.0-rc7+ #271
>>>> [   47.709630] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
>>>> [   47.718081] Call Trace:
>>>> [   47.720903]  dump_stack+0x4f/0x73
>>>> [   47.724694]  ? blk_mq_sched_get_request+0x6c/0x280
>>>> [   47.730137]  __schedule_bug+0x6c/0xc0
>>>> [   47.734314]  __schedule+0x559/0x780
>>>> [   47.738302]  schedule+0x3b/0x90
>>>> [   47.741899]  io_schedule+0x11/0x40
>>>> [   47.745788]  blk_mq_get_tag+0x167/0x2a0
>>>> [   47.750162]  ? remove_wait_queue+0x70/0x70
>>>> [   47.754901]  blk_mq_get_driver_tag+0x92/0xf0
>>>> [   47.759758]  blk_mq_sched_insert_request+0x134/0x170
>>>> [   47.765398]  ? blk_account_io_start+0xd0/0x270
>>>> [   47.770679]  blk_mq_make_request+0x1b2/0x850
>>>> [   47.775766]  generic_make_request+0xf7/0x2d0
>>>> [   47.780860]  submit_bio+0x5f/0x120
>>>> [   47.784979]  ? submit_bio+0x5f/0x120
>>>> [   47.789631]  submit_bh_wbc.isra.46+0x10d/0x130
>>>> [   47.794902]  submit_bh+0xb/0x10
>>>> [   47.798719]  journal_submit_commit_record+0x190/0x210
>>>> [   47.804686]  ? _raw_spin_unlock+0x13/0x30
>>>> [   47.809480]  jbd2_journal_commit_transaction+0x180a/0x1d00
>>>> [   47.815925]  kjournald2+0xb6/0x250
>>>> [   47.820022]  ? kjournald2+0xb6/0x250
>>>> [   47.824328]  ? remove_wait_queue+0x70/0x70
>>>> [   47.829223]  kthread+0x10e/0x140
>>>> [   47.833147]  ? commit_timeout+0x10/0x10
>>>> [   47.837742]  ? kthread_create_on_node+0x40/0x40
>>>> [   47.843122]  ret_from_fork+0x29/0x40
>>>>
>>>> Fixes: a4d907b6a33b ("blk-mq: streamline blk_mq_make_request")
>>>> Signed-off-by: Jens Axboe <axboe@fb.com>
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 9d7645f24b05..323eed50d615 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -1634,8 +1634,10 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>>>>  		blk_mq_try_issue_directly(data.hctx, rq, &cookie);
>>>>  		return cookie;
>>>>  	} else if (q->elevator) {
>>>> +		blk_mq_put_ctx(data.ctx);
>>>>  		blk_mq_bio_to_request(rq, bio);
>>>>  		blk_mq_sched_insert_request(rq, false, true, true, true);
>>>> +		return cookie;
>>>>  	} else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio))
>>>>  		blk_mq_run_hw_queue(data.hctx, true);
>>>>  
>>>>
>>>
>>> I'm confused, the first thing we check in make_request is:
>>>
>>> 	if (unlikely(is_flush_fua)) {
>>> 		blk_mq_bio_to_request(rq, bio);
>>> 		if (q->elevator) {
>>> 			blk_mq_sched_insert_request(rq, false, true, true,
>>> 					true);
>>> 		} else {
>>> 			blk_insert_flush(rq);
>>> 			blk_mq_run_hw_queue(data.hctx, true);
>>> 		}
>>> 	}
>>>
>>> and can_block doesn't do anything in the !flush case, so shouldn't it be
>>> changed in that one instead?
>>
>> Just to get closure on this issue, the two cases ends up being folded
>> into one. So we're really triggering the first case, but it's a jump
>> to the 2nd one.
>>
>> Both cases should still be fixed up, the 2nd patch I sent out should be
>> fine.
> 
> You can add
> 
> Reviewed-by: Omar Sandoval <osandov@fb.com>
> 
> for the 2nd patch.

Thanks Omar, queued up for 4.12.
diff mbox

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9d7645f24b05..323eed50d615 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1634,8 +1634,10 @@  static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_try_issue_directly(data.hctx, rq, &cookie);
 		return cookie;
 	} else if (q->elevator) {
+		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
 		blk_mq_sched_insert_request(rq, false, true, true, true);
+		return cookie;
 	} else if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio))
 		blk_mq_run_hw_queue(data.hctx, true);