[RFC,1/1] block: get rid of request queue ->sysfs_dir_lock

Message ID	20250120130413.789737-2-nilay@linux.ibm.com (mailing list archive)
State	New
Headers	show Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E1591E0DB5 for <linux-block@vger.kernel.org>; Mon, 20 Jan 2025 13:04:28 +0000 (UTC) From: Nilay Shroff <nilay@linux.ibm.com> To: linux-block@vger.kernel.org Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, axboe@kernel.dk, gjoyce@ibm.com Subject: [RFC PATCH 1/1] block: get rid of request queue ->sysfs_dir_lock Date: Mon, 20 Jan 2025 18:34:11 +0530 Message-ID: <20250120130413.789737-2-nilay@linux.ibm.com> In-Reply-To: <20250120130413.789737-1-nilay@linux.ibm.com> References: <20250120130413.789737-1-nilay@linux.ibm.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	remove redundant q->sysfs_dir_lock \| expand [RFC,0/1] remove redundant q->sysfs_dir_lock [RFC,1/1] block: get rid of request queue ->sysfs_dir_lock

Message ID

20250120130413.789737-2-nilay@linux.ibm.com (mailing list archive)

State

New

Headers

From: Nilay Shroff <nilay@linux.ibm.com>
To: linux-block@vger.kernel.org
Cc: hch@lst.de, ming.lei@redhat.com, dlemoal@kernel.org, axboe@kernel.dk,
        gjoyce@ibm.com
Subject: [RFC PATCH 1/1] block: get rid of request queue ->sysfs_dir_lock
Date: Mon, 20 Jan 2025 18:34:11 +0530
Message-ID: <20250120130413.789737-2-nilay@linux.ibm.com>
In-Reply-To: <20250120130413.789737-1-nilay@linux.ibm.com>
References: <20250120130413.789737-1-nilay@linux.ibm.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

remove redundant q->sysfs_dir_lock | expand

Commit Message

Nilay Shroff Jan. 20, 2025, 1:04 p.m. UTC

The request queue uses ->sysfs_dir_lock for protecting the addition/
deletion of kobject entries under sysfs while we register/unregister
blk-mq. However kobject addition/deletion is already protected with
kernfs/sysfs internal synchronization primitives. So use of q->sysfs_
dir_lock seems redundant.

Moreover, q->sysfs_dir_lock is also used at few other callsites along
with q->sysfs_lock for protecting the addition/deletion of kojects.
One such example is when we register with sysfs a set of independent
access ranges for a disk. Here as well we could get rid off q->sysfs_
dir_lock and only use q->sysfs_lock.

The only variable which q->sysfs_dir_lock appears to protect is q->
mq_sysfs_init_done which is set/unset while registering/unregistering
blk-mq with sysfs. But this variable could be easily converted to an
atomic type and used safely without using q->sysfs_dir_lock.

So with this patch we remove q->sysfs_dir_lock from each callsite
and also make q->mq_sysfs_init_done an atomic variable.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-core.c       |  1 -
 block/blk-ia-ranges.c  |  4 ----
 block/blk-mq-sysfs.c   | 25 +++++++------------------
 block/blk-sysfs.c      |  5 -----
 include/linux/blkdev.h |  3 +--
 5 files changed, 8 insertions(+), 30 deletions(-)

Comments

Christoph Hellwig Jan. 22, 2025, 6:28 a.m. UTC | #1

On Mon, Jan 20, 2025 at 06:34:11PM +0530, Nilay Shroff wrote:
> The request queue uses ->sysfs_dir_lock for protecting the addition/
> deletion of kobject entries under sysfs while we register/unregister
> blk-mq. However kobject addition/deletion is already protected with
> kernfs/sysfs internal synchronization primitives. So use of q->sysfs_
> dir_lock seems redundant.

From the pure sysfs perspective, yes.  The weird thing with block
layer sysfs is that unregistration/registration can happen at weird
times, though.

> Moreover, q->sysfs_dir_lock is also used at few other callsites along
> with q->sysfs_lock for protecting the addition/deletion of kojects.
> One such example is when we register with sysfs a set of independent
> access ranges for a disk. Here as well we could get rid off q->sysfs_
> dir_lock and only use q->sysfs_lock.
> 
> The only variable which q->sysfs_dir_lock appears to protect is q->
> mq_sysfs_init_done which is set/unset while registering/unregistering
> blk-mq with sysfs. But this variable could be easily converted to an
> atomic type and used safely without using q->sysfs_dir_lock.
> 

So sysfs_dir_lock absolutely should go.  OTOH relying more on sysfs_lock
is a bad idea, as that is also serialied with the attributes, which
again on a pure sysfs basis isn't needed.  Given that you don't add
new critical sections for it this should be fine, though.

> So with this patch we remove q->sysfs_dir_lock from each callsite
> and also make q->mq_sysfs_init_done an atomic variable.

Using an atomic_t for a single variable is usually not a good idea.
Let's take a step back and look at what mq_sysfs_init_done is trying
to do.

It is set by blk_mq_sysfs_register, which is called by
blk_register_queue. before marking the queue registered and setting the
disk live.  It is cleared in blk_mq_sysfs_unregister called from
blk_unregister_queue just after clearing the queue registered bit.

So maybe we could do something with the queue registered bit, although
eventually I'd like to kill that as well, but either way we need
to explain how the flag prevents the nr_hw_queues update racing with
disk addition/removal.  AFAICS we're not completely getting this
right even right now.  We'd probably need to hold tag_list_lock
over registering and unregistering the hctx sysfs files.

Nilay Shroff Jan. 22, 2025, 6:06 p.m. UTC | #2

On 1/22/25 11:58 AM, Christoph Hellwig wrote:
> On Mon, Jan 20, 2025 at 06:34:11PM +0530, Nilay Shroff wrote:
>> The request queue uses ->sysfs_dir_lock for protecting the addition/
>> deletion of kobject entries under sysfs while we register/unregister
>> blk-mq. However kobject addition/deletion is already protected with
>> kernfs/sysfs internal synchronization primitives. So use of q->sysfs_
>> dir_lock seems redundant.
> 
> From the pure sysfs perspective, yes.  The weird thing with block
> layer sysfs is that unregistration/registration can happen at weird
> times, though.
> 
>> Moreover, q->sysfs_dir_lock is also used at few other callsites along
>> with q->sysfs_lock for protecting the addition/deletion of kojects.
>> One such example is when we register with sysfs a set of independent
>> access ranges for a disk. Here as well we could get rid off q->sysfs_
>> dir_lock and only use q->sysfs_lock.
>>
>> The only variable which q->sysfs_dir_lock appears to protect is q->
>> mq_sysfs_init_done which is set/unset while registering/unregistering
>> blk-mq with sysfs. But this variable could be easily converted to an
>> atomic type and used safely without using q->sysfs_dir_lock.
>>
> 
> So sysfs_dir_lock absolutely should go.  OTOH relying more on sysfs_lock
> is a bad idea, as that is also serialied with the attributes, which
> again on a pure sysfs basis isn't needed.  Given that you don't add
> new critical sections for it this should be fine, though.
> 
>> So with this patch we remove q->sysfs_dir_lock from each callsite
>> and also make q->mq_sysfs_init_done an atomic variable.
> 
> Using an atomic_t for a single variable is usually not a good idea.
> Let's take a step back and look at what mq_sysfs_init_done is trying
> to do.
> 
> It is set by blk_mq_sysfs_register, which is called by
> blk_register_queue. before marking the queue registered and setting the
> disk live.  It is cleared in blk_mq_sysfs_unregister called from
> blk_unregister_queue just after clearing the queue registered bit.
>
Yeah that's good idea! Indeed, I think we can remove ->mq_sysfs_init_done and 
replace it with the QUEUE_FLAG_REGISTERED.
 
> So maybe we could do something with the queue registered bit, although
> eventually I'd like to kill that as well, but either way we need
> to explain how the flag prevents the nr_hw_queues update racing with
> disk addition/removal.  AFAICS we're not completely getting this
> right even right now.  We'd probably need to hold tag_list_lock
> over registering and unregistering the hctx sysfs files.
> 
Agreed, makes sense to hold ->tag_list_lock so that we can contain 
the race between nr_hw_queue update with registering/unregistering the 
hctx sysfs files. 

I'd spin a new patch with above changes and submit.

Thank you Christoph for your review and comments!

--Nilay

diff --git a/block/blk-core.c b/block/blk-core.c
index 32fb28a6372c..d6c4fa3943b5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -430,7 +430,6 @@  struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
 	refcount_set(&q->refs, 1);
 	mutex_init(&q->debugfs_mutex);
 	mutex_init(&q->sysfs_lock);
-	mutex_init(&q->sysfs_dir_lock);
 	mutex_init(&q->limits_lock);
 	mutex_init(&q->rq_qos_mutex);
 	spin_lock_init(&q->queue_lock);
diff --git a/block/blk-ia-ranges.c b/block/blk-ia-ranges.c
index c9eb4241e048..d479f5481b66 100644
--- a/block/blk-ia-ranges.c
+++ b/block/blk-ia-ranges.c
@@ -111,7 +111,6 @@  int disk_register_independent_access_ranges(struct gendisk *disk)
 	struct request_queue *q = disk->queue;
 	int i, ret;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
 	lockdep_assert_held(&q->sysfs_lock);
 
 	if (!iars)
@@ -155,7 +154,6 @@  void disk_unregister_independent_access_ranges(struct gendisk *disk)
 	struct blk_independent_access_ranges *iars = disk->ia_ranges;
 	int i;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
 	lockdep_assert_held(&q->sysfs_lock);
 
 	if (!iars)
@@ -289,7 +287,6 @@  void disk_set_independent_access_ranges(struct gendisk *disk,
 {
 	struct request_queue *q = disk->queue;
 
-	mutex_lock(&q->sysfs_dir_lock);
 	mutex_lock(&q->sysfs_lock);
 	if (iars && !disk_check_ia_ranges(disk, iars)) {
 		kfree(iars);
@@ -313,6 +310,5 @@  void disk_set_independent_access_ranges(struct gendisk *disk,
 		disk_register_independent_access_ranges(disk);
 unlock:
 	mutex_unlock(&q->sysfs_lock);
-	mutex_unlock(&q->sysfs_dir_lock);
 }
 EXPORT_SYMBOL_GPL(disk_set_independent_access_ranges);
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 156e9bb07abf..0d2226035ab6 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -223,8 +223,6 @@  int blk_mq_sysfs_register(struct gendisk *disk)
 	unsigned long i, j;
 	int ret;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
-
 	ret = kobject_add(q->mq_kobj, &disk_to_dev(disk)->kobj, "mq");
 	if (ret < 0)
 		goto out;
@@ -237,7 +235,7 @@  int blk_mq_sysfs_register(struct gendisk *disk)
 			goto unreg;
 	}
 
-	q->mq_sysfs_init_done = true;
+	atomic_set(&q->mq_sysfs_init_done, 1);
 
 out:
 	return ret;
@@ -259,15 +257,13 @@  void blk_mq_sysfs_unregister(struct gendisk *disk)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
-	lockdep_assert_held(&q->sysfs_dir_lock);
-
 	queue_for_each_hw_ctx(q, hctx, i)
 		blk_mq_unregister_hctx(hctx);
 
 	kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
 	kobject_del(q->mq_kobj);
 
-	q->mq_sysfs_init_done = false;
+	atomic_set(&q->mq_sysfs_init_done, 0);
 }
 
 void blk_mq_sysfs_unregister_hctxs(struct request_queue *q)
@@ -275,15 +271,11 @@  void blk_mq_sysfs_unregister_hctxs(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
-	mutex_lock(&q->sysfs_dir_lock);
-	if (!q->mq_sysfs_init_done)
-		goto unlock;
+	if (!atomic_read(&q->mq_sysfs_init_done))
+		return;
 
 	queue_for_each_hw_ctx(q, hctx, i)
 		blk_mq_unregister_hctx(hctx);
-
-unlock:
-	mutex_unlock(&q->sysfs_dir_lock);
 }
 
 int blk_mq_sysfs_register_hctxs(struct request_queue *q)
@@ -292,9 +284,8 @@  int blk_mq_sysfs_register_hctxs(struct request_queue *q)
 	unsigned long i;
 	int ret = 0;
 
-	mutex_lock(&q->sysfs_dir_lock);
-	if (!q->mq_sysfs_init_done)
-		goto unlock;
+	if (!atomic_read(&q->mq_sysfs_init_done))
+		goto out;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_register_hctx(hctx);
@@ -302,8 +293,6 @@  int blk_mq_sysfs_register_hctxs(struct request_queue *q)
 			break;
 	}
 
-unlock:
-	mutex_unlock(&q->sysfs_dir_lock);
-
+out:
 	return ret;
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 368c2f6f5c85..6f548a4376aa 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -762,7 +762,6 @@  int blk_register_queue(struct gendisk *disk)
 	struct request_queue *q = disk->queue;
 	int ret;
 
-	mutex_lock(&q->sysfs_dir_lock);
 	kobject_init(&disk->queue_kobj, &blk_queue_ktype);
 	ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
 	if (ret < 0)
@@ -803,7 +802,6 @@  int blk_register_queue(struct gendisk *disk)
 	if (q->elevator)
 		kobject_uevent(&q->elevator->kobj, KOBJ_ADD);
 	mutex_unlock(&q->sysfs_lock);
-	mutex_unlock(&q->sysfs_dir_lock);
 
 	/*
 	 * SCSI probing may synchronously create and destroy a lot of
@@ -828,7 +826,6 @@  int blk_register_queue(struct gendisk *disk)
 	mutex_unlock(&q->sysfs_lock);
 out_put_queue_kobj:
 	kobject_put(&disk->queue_kobj);
-	mutex_unlock(&q->sysfs_dir_lock);
 	return ret;
 }
 
@@ -859,7 +856,6 @@  void blk_unregister_queue(struct gendisk *disk)
 	blk_queue_flag_clear(QUEUE_FLAG_REGISTERED, q);
 	mutex_unlock(&q->sysfs_lock);
 
-	mutex_lock(&q->sysfs_dir_lock);
 	/*
 	 * Remove the sysfs attributes before unregistering the queue data
 	 * structures that can be modified through sysfs.
@@ -876,7 +872,6 @@  void blk_unregister_queue(struct gendisk *disk)
 	/* Now that we've deleted all child objects, we can delete the queue. */
 	kobject_uevent(&disk->queue_kobj, KOBJ_REMOVE);
 	kobject_del(&disk->queue_kobj);
-	mutex_unlock(&q->sysfs_dir_lock);
 
 	blk_debugfs_remove(disk);
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 76f0a4e7c2e5..1502970eefff 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -561,7 +561,6 @@  struct request_queue {
 	struct list_head	flush_list;
 
 	struct mutex		sysfs_lock;
-	struct mutex		sysfs_dir_lock;
 	struct mutex		limits_lock;
 
 	/*
@@ -606,7 +605,7 @@  struct request_queue {
 	 */
 	struct mutex		debugfs_mutex;
 
-	bool			mq_sysfs_init_done;
+	atomic_t		mq_sysfs_init_done;
 };
 
 /* Keep blk_queue_flag_name[] in sync with the definitions below */

[RFC,1/1] block: get rid of request queue ->sysfs_dir_lock

Commit Message

Comments

Patch