diff mbox series

[v2,5/7] block: Delay default elevator initialization

Message ID 20190828022947.23364-6-damien.lemoal@wdc.com (mailing list archive)
State Superseded
Headers show
Series Elevator cleanups and improvements | expand

Commit Message

Damien Le Moal Aug. 28, 2019, 2:29 a.m. UTC
When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
the only information known about the device is the number of hardware
queues as the block device scan by the device driver is not completed
yet. The device type and the device required features are not set yet,
preventing to correctly choose the default elevator most suitable for
the device.

This currently affects all multi-queue zoned block devices which default
to the "none" elevator instead of the required "mq-deadline" elevator.
These drives currently include host-managed SMR disks connected to a
smartpqi HBA and null_blk block devices with zoned mode enabled.
Upcoming NVMe Zoned Namespace devices will also be affected.

Fix this by moving the execution of elevator_init_mq() from
blk_mq_init_allocated_queue() into __device_add_disk() to allow for the
device driver to probe the device characteristics and set attributes
of the device request queue prior to the elevator initialization.

Also to make sure that the elevator initialization is never done while
requests are in-flight (there should be none when the device driver
calls device_add_disk()), freeze and quiesce the device request queue
before executing blk_mq_init_sched().

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/blk-mq.c   | 2 --
 block/elevator.c | 7 +++++++
 block/genhd.c    | 3 +++
 3 files changed, 10 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig Sept. 3, 2019, 9:02 a.m. UTC | #1
On Wed, Aug 28, 2019 at 11:29:45AM +0900, Damien Le Moal wrote:
> When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
> the only information known about the device is the number of hardware
> queues as the block device scan by the device driver is not completed
> yet. The device type and the device required features are not set yet,
> preventing to correctly choose the default elevator most suitable for
> the device.
> 
> This currently affects all multi-queue zoned block devices which default
> to the "none" elevator instead of the required "mq-deadline" elevator.
> These drives currently include host-managed SMR disks connected to a
> smartpqi HBA and null_blk block devices with zoned mode enabled.
> Upcoming NVMe Zoned Namespace devices will also be affected.
> 
> Fix this by moving the execution of elevator_init_mq() from
> blk_mq_init_allocated_queue() into __device_add_disk() to allow for the
> device driver to probe the device characteristics and set attributes
> of the device request queue prior to the elevator initialization.
> 
> Also to make sure that the elevator initialization is never done while
> requests are in-flight (there should be none when the device driver
> calls device_add_disk()), freeze and quiesce the device request queue
> before executing blk_mq_init_sched().

So the disk can be accessed from userspace or partition probing once we
registered the region.  Based on that I think it would be better if
we set the elevator a little earlier before that happens.  With that
we shouldn't have to freeze the queue.
Damien Le Moal Sept. 4, 2019, 2:07 a.m. UTC | #2
On 2019/09/03 18:02, Christoph Hellwig wrote:
> On Wed, Aug 28, 2019 at 11:29:45AM +0900, Damien Le Moal wrote:
>> When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
>> the only information known about the device is the number of hardware
>> queues as the block device scan by the device driver is not completed
>> yet. The device type and the device required features are not set yet,
>> preventing to correctly choose the default elevator most suitable for
>> the device.
>>
>> This currently affects all multi-queue zoned block devices which default
>> to the "none" elevator instead of the required "mq-deadline" elevator.
>> These drives currently include host-managed SMR disks connected to a
>> smartpqi HBA and null_blk block devices with zoned mode enabled.
>> Upcoming NVMe Zoned Namespace devices will also be affected.
>>
>> Fix this by moving the execution of elevator_init_mq() from
>> blk_mq_init_allocated_queue() into __device_add_disk() to allow for the
>> device driver to probe the device characteristics and set attributes
>> of the device request queue prior to the elevator initialization.
>>
>> Also to make sure that the elevator initialization is never done while
>> requests are in-flight (there should be none when the device driver
>> calls device_add_disk()), freeze and quiesce the device request queue
>> before executing blk_mq_init_sched().
> 
> So the disk can be accessed from userspace or partition probing once we
> registered the region.  Based on that I think it would be better if
> we set the elevator a little earlier before that happens.  With that
> we shouldn't have to freeze the queue.
> 

OK. I will move the registration earlier in device_add_disk(), before the region
registration.

However, I would still like to keep the queue freeze to protect against buggy
device drivers that call device_add_disk() with internal commands still going
on. I do not think that there are any such driver, but just want to avoid
problems. The queue freeze is also present for any user initiated elevator
change, so in this respect, this is not any different and should not be a big
problem. Thoughts ?
Christoph Hellwig Sept. 4, 2019, 6:47 a.m. UTC | #3
On Wed, Sep 04, 2019 at 02:07:39AM +0000, Damien Le Moal wrote:
> OK. I will move the registration earlier in device_add_disk(), before the region
> registration.
> 
> However, I would still like to keep the queue freeze to protect against buggy
> device drivers that call device_add_disk() with internal commands still going
> on. I do not think that there are any such driver, but just want to avoid
> problems. The queue freeze is also present for any user initiated elevator
> change, so in this respect, this is not any different and should not be a big
> problem. Thoughts ?

I don't really see the point, but there should be no harm it either
since a freeze of a non-busy queue should be fast.
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0c9b1f403db8..baf0c9cd8237 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2893,8 +2893,6 @@  struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	blk_mq_add_queue_tag_set(set, q);
 	blk_mq_map_swqueue(q);
 
-	elevator_init_mq(q);
-
 	return q;
 
 err_hctxs:
diff --git a/block/elevator.c b/block/elevator.c
index 81d0877dbc34..433ce722cf0a 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -729,7 +729,14 @@  void elevator_init_mq(struct request_queue *q)
 	if (!e)
 		return;
 
+	blk_mq_freeze_queue(q);
+	blk_mq_quiesce_queue(q);
+
 	err = blk_mq_init_sched(q, e);
+
+	blk_mq_unquiesce_queue(q);
+	blk_mq_unfreeze_queue(q);
+
 	if (err) {
 		pr_warn("\"%s\" elevator initialization failed, "
 			"falling back to \"none\"\n", e->elevator_name);
diff --git a/block/genhd.c b/block/genhd.c
index 54f1f0d381f4..d2114c25dccd 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -734,6 +734,9 @@  static void __device_add_disk(struct device *parent, struct gendisk *disk,
 				    exact_match, exact_lock, disk);
 	}
 	register_disk(parent, disk, groups);
+
+	elevator_init_mq(disk->queue);
+
 	if (register_queue)
 		blk_register_queue(disk);