diff mbox series

[RFC,PATCHv2,2/2] block: fix nr_hw_queue update racing with disk addition/removal

Message ID 20250123174124.24554-3-nilay@linux.ibm.com (mailing list archive)
State New
Headers show
Series block: remove q->sysfs_dir_lock and fix race updating nr_hw_queue | expand

Commit Message

Nilay Shroff Jan. 23, 2025, 5:40 p.m. UTC
The nr_hw_queue update could potentially race with disk addtion/removal
while registering/unregistering hctx sysfs files. The __blk_mq_update_
nr_hw_queues() runs with q->tag_list_lock held and so to avoid it racing
with disk addition/removal we should acquire q->tag_list_lock while
registering/unregistering hctx sysfs files.

With this patch, blk_mq_sysfs_register() (called during disk addition)
and blk_mq_sysfs_unregister() (called during disk removal) now runs
with q->tag_list_lock held so that it avoids racing with __blk_mq_update
_nr_hw_queues().

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-mq-sysfs.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Jan. 28, 2025, 5:49 a.m. UTC | #1
On Thu, Jan 23, 2025 at 11:10:24PM +0530, Nilay Shroff wrote:
>  
> +	mutex_lock(&q->tag_set->tag_list_lock);
> +
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		ret = blk_mq_register_hctx(hctx);
> -		if (ret)
> +		if (ret) {
> +			mutex_unlock(&q->tag_set->tag_list_lock);
>  			goto unreg;
> +		}
>  	}
>  
> +	mutex_unlock(&q->tag_set->tag_list_lock);
>  
>  out:
>  	return ret;
>  
>  unreg:
> +	mutex_lock(&q->tag_set->tag_list_lock);
> +

No real need for a unlock/lock cycle here I think.  Also as something
that is really just a nitpick, I love to keep the locks for the critical
sections close to the code they're protecting.  e.g. format this as:

	if (ret < 0)
		return ret;

	mutex_lock(&q->tag_set->tag_list_lock);
	queue_for_each_hw_ctx(q, hctx, i) {
		ret = blk_mq_register_hctx(hctx);
		if (ret)
			goto out_unregister;
	}
	mutex_unlock(&q->tag_set->tag_list_lock);
 	return 0

out_unregister:
	queue_for_each_hw_ctx(q, hctx, j) {
 		if (j < i)
			blk_mq_unregister_hctx(hctx);
	}
	mutex_unlock(&q->tag_set->tag_list_lock);

...

(and similar for blk_mq_sysfs_unregister).

I assume you did run this through blktests and xfstests with lockdep
enabled to catch if we created some new lock ordering problems?
I can't think of any right now, but it's good to validate that.
Nilay Shroff Jan. 28, 2025, 7:11 a.m. UTC | #2
On 1/28/25 11:19 AM, Christoph Hellwig wrote:
> On Thu, Jan 23, 2025 at 11:10:24PM +0530, Nilay Shroff wrote:
>>  
>> +	mutex_lock(&q->tag_set->tag_list_lock);
>> +
>>  	queue_for_each_hw_ctx(q, hctx, i) {
>>  		ret = blk_mq_register_hctx(hctx);
>> -		if (ret)
>> +		if (ret) {
>> +			mutex_unlock(&q->tag_set->tag_list_lock);
>>  			goto unreg;
>> +		}
>>  	}
>>  
>> +	mutex_unlock(&q->tag_set->tag_list_lock);
>>  
>>  out:
>>  	return ret;
>>  
>>  unreg:
>> +	mutex_lock(&q->tag_set->tag_list_lock);
>> +
> 
> No real need for a unlock/lock cycle here I think.  Also as something
> that is really just a nitpick, I love to keep the locks for the critical
> sections close to the code they're protecting.  e.g. format this as:
> 
> 	if (ret < 0)
> 		return ret;
> 
> 	mutex_lock(&q->tag_set->tag_list_lock);
> 	queue_for_each_hw_ctx(q, hctx, i) {
> 		ret = blk_mq_register_hctx(hctx);
> 		if (ret)
> 			goto out_unregister;
> 	}
> 	mutex_unlock(&q->tag_set->tag_list_lock);
>  	return 0
> 
> out_unregister:
> 	queue_for_each_hw_ctx(q, hctx, j) {
>  		if (j < i)
> 			blk_mq_unregister_hctx(hctx);
> 	}
> 	mutex_unlock(&q->tag_set->tag_list_lock);
> 
> ...
> 
> (and similar for blk_mq_sysfs_unregister).

Yes looks good. I will update code and send next patch.

> I assume you did run this through blktests and xfstests with lockdep
> enabled to catch if we created some new lock ordering problems?
> I can't think of any right now, but it's good to validate that.
> 
Yeah I ran blktests with lockdep enabled before submitting patch to
ensure that lockdep doesn't generate any waning with changes. However
I didn't run xfstests. Anyways, I would now run both blktests and 
xfstests before submitting the next patch.

Thanks,
--Nilay
diff mbox series

Patch

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 6113328abd70..2b9df2b73bf6 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -229,22 +229,31 @@  int blk_mq_sysfs_register(struct gendisk *disk)
 
 	kobject_uevent(q->mq_kobj, KOBJ_ADD);
 
+	mutex_lock(&q->tag_set->tag_list_lock);
+
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_register_hctx(hctx);
-		if (ret)
+		if (ret) {
+			mutex_unlock(&q->tag_set->tag_list_lock);
 			goto unreg;
+		}
 	}
 
+	mutex_unlock(&q->tag_set->tag_list_lock);
 
 out:
 	return ret;
 
 unreg:
+	mutex_lock(&q->tag_set->tag_list_lock);
+
 	queue_for_each_hw_ctx(q, hctx, j) {
 		if (j < i)
 			blk_mq_unregister_hctx(hctx);
 	}
 
+	mutex_unlock(&q->tag_set->tag_list_lock);
+
 	kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
 	kobject_del(q->mq_kobj);
 	return ret;
@@ -256,10 +265,13 @@  void blk_mq_sysfs_unregister(struct gendisk *disk)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
+	mutex_lock(&q->tag_set->tag_list_lock);
 
 	queue_for_each_hw_ctx(q, hctx, i)
 		blk_mq_unregister_hctx(hctx);
 
+	mutex_unlock(&q->tag_set->tag_list_lock);
+
 	kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
 	kobject_del(q->mq_kobj);
 }