diff mbox series

block: Fix potential deadlock warning in blk_mq_mark_tag_wait

Message ID 20240814113542.911023-1-lilingfeng@huaweicloud.com (mailing list archive)
State New, archived
Headers show
Series block: Fix potential deadlock warning in blk_mq_mark_tag_wait | expand

Commit Message

Li Lingfeng Aug. 14, 2024, 11:35 a.m. UTC
From: Li Lingfeng <lilingfeng3@huawei.com>

When interrupt is turned on while a lock holding by spin_lock_irq it
throws a warning because of potential deadlock.

blk_mq_mark_tag_wait
 spin_lock_irq(&wq->lock)
      --> turn off interrupt and get lockA
 blk_mq_get_driver_tag
  __blk_mq_tag_busy
   spin_lock_irq(&tags->lock)
   spin_unlock_irq(&tags->lock)
      --> release lockB and turn on interrupt accidentally
...
  --> Interrupt may be triggered and try get wq->lock while it is held
                                           blk_complete_reqs
                                            ...
                                            blk_mq_put_tag
                                             ...
                                             __wake_up_common_lock
                                              spin_lock_irqsave
                                              --> try get lock again
...
 spin_unlock_irq(&wq->lock)

Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.
Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
---
 block/blk-mq-tag.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Bart Van Assche Aug. 14, 2024, 7:52 p.m. UTC | #1
On 8/14/24 4:35 AM, Li Lingfeng wrote:
> When interrupt is turned on while a lock holding by spin_lock_irq it
> throws a warning because of potential deadlock.

Which tool reported the warning? Please mention this in the patch
description.
> 
> blk_mq_mark_tag_wait
>   spin_lock_irq(&wq->lock)
>        --> turn off interrupt and get lockA
>   blk_mq_get_driver_tag
>    __blk_mq_tag_busy
>     spin_lock_irq(&tags->lock)
>     spin_unlock_irq(&tags->lock)
>        --> release lockB and turn on interrupt accidentally

The above call chain does not match the code in Linus' master tree.
Please fix this.

> Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.
> Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'")
> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>

Please leave a blank line between the patch description and the section
with tags.

> -	spin_lock_irq(&tags->lock);
> +	spin_lock_irqsave(&tags->lock, flags);
>   	users = tags->active_queues + 1;
>   	WRITE_ONCE(tags->active_queues, users);
>   	blk_mq_update_wake_batch(tags, users);
> -	spin_unlock_irq(&tags->lock);
> +	spin_unlock_irqrestore(&tags->lock, flags);
>   }

The code changes however look good to me.

Thanks,

Bart.
Jens Axboe Aug. 14, 2024, 8:22 p.m. UTC | #2
On 8/14/24 1:52 PM, Bart Van Assche wrote:
> On 8/14/24 4:35 AM, Li Lingfeng wrote:
>> When interrupt is turned on while a lock holding by spin_lock_irq it
>> throws a warning because of potential deadlock.
> 
> Which tool reported the warning? Please mention this in the patch
> description.
>>
>> blk_mq_mark_tag_wait
>>   spin_lock_irq(&wq->lock)
>>        --> turn off interrupt and get lockA
>>   blk_mq_get_driver_tag
>>    __blk_mq_tag_busy
>>     spin_lock_irq(&tags->lock)
>>     spin_unlock_irq(&tags->lock)
>>        --> release lockB and turn on interrupt accidentally
> 
> The above call chain does not match the code in Linus' master tree.
> Please fix this.
> 
>> Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.
>> Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'")
>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> 
> Please leave a blank line between the patch description and the section
> with tags.

Please just include the actual lockdep trace rather than a doctored up
one, it's a lot more descriptive. And use the real lock names rather
than turn it into hypotheticals.
Yu Kuai Aug. 15, 2024, 1:54 a.m. UTC | #3
Hi,

在 2024/08/15 4:22, Jens Axboe 写道:
> On 8/14/24 1:52 PM, Bart Van Assche wrote:
>> On 8/14/24 4:35 AM, Li Lingfeng wrote:
>>> When interrupt is turned on while a lock holding by spin_lock_irq it
>>> throws a warning because of potential deadlock.
>>
>> Which tool reported the warning? Please mention this in the patch
>> description.
>>>
>>> blk_mq_mark_tag_wait
>>>    spin_lock_irq(&wq->lock)
>>>         --> turn off interrupt and get lockA
>>>    blk_mq_get_driver_tag
>>>     __blk_mq_tag_busy
>>>      spin_lock_irq(&tags->lock)
>>>      spin_unlock_irq(&tags->lock)
>>>         --> release lockB and turn on interrupt accidentally

This looks correct, however, many details are hidden:

t1: IO dispatch
blk_mq_prep_dispatch_rq
  blk_mq_get_driver_tag
   __blk_mq_get_driver_tag
    __blk_mq_alloc_driver_tag
     blk_mq_tag_busy -> tag is already busy
     // failed to get driver tag

  blk_mq_mark_tag_wait
   spin_lock_irq(&wq->lock) -> lock A
   __add_wait_queue(wq, wait) -> wait queue active
   blk_mq_get_driver_tag
   __blk_mq_tag_busy
-> 1) tag must be idle, which means there can't be inflight IO
    spin_lock_irq(&tags->lock) -> lock B
    spin_unlock_irq(&tags->lock) -> unlock B
-> 2) context must be preempt by IO interrupt to trigger deadlock.

So, the deadlock is not possible in theory, there can't be inflight IO
if __blk_mq_tag_busy what to hold the second lock, while deadlock
require IO to be done.

Any way, the change looks good to me.

Thanks,
Kuai

>>
>> The above call chain does not match the code in Linus' master tree.
>> Please fix this.
>>
>>> Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.
>>> Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'")
>>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>>
>> Please leave a blank line between the patch description and the section
>> with tags.
> 
> Please just include the actual lockdep trace rather than a doctored up
> one, it's a lot more descriptive. And use the real lock names rather
> than turn it into hypotheticals.
>
diff mbox series

Patch

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index cc57e2dd9a0b..2cafcf11ee8b 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -38,6 +38,7 @@  static void blk_mq_update_wake_batch(struct blk_mq_tags *tags,
 void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
 	unsigned int users;
+	unsigned long flags;
 	struct blk_mq_tags *tags = hctx->tags;
 
 	/*
@@ -56,11 +57,11 @@  void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 			return;
 	}
 
-	spin_lock_irq(&tags->lock);
+	spin_lock_irqsave(&tags->lock, flags);
 	users = tags->active_queues + 1;
 	WRITE_ONCE(tags->active_queues, users);
 	blk_mq_update_wake_batch(tags, users);
-	spin_unlock_irq(&tags->lock);
+	spin_unlock_irqrestore(&tags->lock, flags);
 }
 
 /*