diff mbox

blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage

Message ID 1501166846.2516.1.camel@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche July 27, 2017, 2:47 p.m. UTC
On Thu, 2017-07-27 at 08:02 -0600, Jens Axboe wrote:
> The bug looks like SCSI running the queue inline from IRQ
> context, that's not a good idea. Can you confirm the below works for
> you?
> 
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..78740ebf966c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -497,7 +497,7 @@ static void scsi_run_queue(struct request_queue *q)
>  		scsi_starved_list_run(sdev->host);
>  
>  	if (q->mq_ops)
> -		blk_mq_run_hw_queues(q, false);
> +		blk_mq_run_hw_queues(q, true);
>  	else
>  		blk_run_queue(q);
>  }

Hello Jens,

scsi_run_queue() works fine if no scheduler is configured. Additionally, that
code predates the introduction of blk-mq I/O schedulers. I think it is
nontrivial for block driver authors to figure out that a queue has to be run
from process context if a scheduler has been configured that does not support
to be run from interrupt context. How about adding WARN_ON_ONCE(in_interrupt())
to blk_mq_start_hw_queue() or replacing the above patch by the following:


Subject: [PATCH] blk-mq: Make it safe to call blk_mq_start_hw_queues() from interrupt context

blk_mq_start_hw_queues() triggers a queue run. Some functions that
get called to run a queue, e.g. dd_dispatch_request(), are not IRQ-safe.
Hence run the queue asynchronously if blk_mq_start_hw_queues() is called
from interrupt context.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Thanks,

Bart.

Comments

Jens Axboe July 27, 2017, 4:26 p.m. UTC | #1
On 07/27/2017 08:47 AM, Bart Van Assche wrote:
> On Thu, 2017-07-27 at 08:02 -0600, Jens Axboe wrote:
>> The bug looks like SCSI running the queue inline from IRQ
>> context, that's not a good idea. Can you confirm the below works for
>> you?
>>
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index f6097b89d5d3..78740ebf966c 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -497,7 +497,7 @@ static void scsi_run_queue(struct request_queue *q)
>>  		scsi_starved_list_run(sdev->host);
>>  
>>  	if (q->mq_ops)
>> -		blk_mq_run_hw_queues(q, false);
>> +		blk_mq_run_hw_queues(q, true);
>>  	else
>>  		blk_run_queue(q);
>>  }
> 
> Hello Jens,
> 
> scsi_run_queue() works fine if no scheduler is configured. Additionally, that
> code predates the introduction of blk-mq I/O schedulers. I think it is
> nontrivial for block driver authors to figure out that a queue has to be run
> from process context if a scheduler has been configured that does not support
> to be run from interrupt context.

No it doesn't, you could never run the queue from interrupt context with
async == false. So I don't think that's confusing at all, you should
always be aware of the context.

> How about adding WARN_ON_ONCE(in_interrupt()) to
> blk_mq_start_hw_queue() or replacing the above patch by the following:

No, I hate having dependencies like that, because they always just catch
one of them. Looks like the IPR path that hits this should just offload
to a workqueue or similar, you don't have to make any scsi_run_queue()
async.
Michael Ellerman July 28, 2017, 6:19 a.m. UTC | #2
Jens Axboe <axboe@kernel.dk> writes:
> On 07/27/2017 08:47 AM, Bart Van Assche wrote:
>> On Thu, 2017-07-27 at 08:02 -0600, Jens Axboe wrote:
>>> The bug looks like SCSI running the queue inline from IRQ
>>> context, that's not a good idea.
...
>> 
>> scsi_run_queue() works fine if no scheduler is configured. Additionally, that
>> code predates the introduction of blk-mq I/O schedulers. I think it is
>> nontrivial for block driver authors to figure out that a queue has to be run
>> from process context if a scheduler has been configured that does not support
>> to be run from interrupt context.
>
> No it doesn't, you could never run the queue from interrupt context with
> async == false. So I don't think that's confusing at all, you should
> always be aware of the context.
>
>> How about adding WARN_ON_ONCE(in_interrupt()) to
>> blk_mq_start_hw_queue() or replacing the above patch by the following:
>
> No, I hate having dependencies like that, because they always just catch
> one of them. Looks like the IPR path that hits this should just offload
> to a workqueue or similar, you don't have to make any scsi_run_queue()
> async.

OK, so the resolution is "fix it in IPR" ?

cheers
diff mbox

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 041f7b7fa0d6..c5cb3b2aabcf 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1251,7 +1251,7 @@  void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
 
-	blk_mq_run_hw_queue(hctx, false);
+	blk_mq_run_hw_queue(hctx, in_interrupt());
 }
 EXPORT_SYMBOL(blk_mq_start_hw_queue);