diff mbox

blk-mq-sched: fix possible crash if changing scheduler fails

Message ID b83737e281e311179615f93be9cfbae50addb1a0.1485215087.git.osandov@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Omar Sandoval Jan. 23, 2017, 11:47 p.m. UTC
From: Omar Sandoval <osandov@fb.com>

In elevator_switch(), we free the old scheduler's tags and then
initialize the new scheduler. If initializing the new scheduler fails,
we fall back to the old scheduler, but our tags have already been freed.
There's no reason to free the sched_tags only to reallocate them, so
let's just reuse them, which makes it possible to safely fall back to
the old scheduler.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
Applies to for-4.11/block. Reproduced the issue and verified the fix by
sticking an err = -ENOMEM in the right place. As far as I can tell, it's
safe to reuse the previously allocated sched_tags for the new scheduler.

 block/elevator.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

Comments

Jens Axboe Jan. 24, 2017, 3:01 p.m. UTC | #1
On 01/23/2017 04:47 PM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> In elevator_switch(), we free the old scheduler's tags and then
> initialize the new scheduler. If initializing the new scheduler fails,
> we fall back to the old scheduler, but our tags have already been freed.
> There's no reason to free the sched_tags only to reallocate them, so
> let's just reuse them, which makes it possible to safely fall back to
> the old scheduler.

Do we need a failure case on both ends? We never free the driver
tags, so it's always perfectly safe to fall back to not running
with a scheduler.

Since we were talking about making the scheduler depth configurable
or dependent on the scheduler, reusing tags seems like something that
would potentially be a short lived "feature".

So I think I'd prefer if we teardown/setup like we do now, and just
have the failure case be falling back to "none".
diff mbox

Patch

diff --git a/block/elevator.c b/block/elevator.c
index ef7f59469acc..28283aaab04c 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -963,9 +963,6 @@  static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	if (old) {
 		old_registered = old->registered;
 
-		if (old->uses_mq)
-			blk_mq_sched_teardown(q);
-
 		if (!q->mq_ops)
 			blk_queue_bypass_start(q);
 
@@ -981,7 +978,14 @@  static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	/* allocate, init and register new elevator */
 	if (new_e) {
 		if (new_e->uses_mq) {
-			err = blk_mq_sched_setup(q);
+			/*
+			 * If we were previously using an I/O scheduler, the
+			 * sched_tags are already allocated.
+			 */
+			if (old)
+				err = 0;
+			else
+				err = blk_mq_sched_setup(q);
 			if (!err)
 				err = new_e->ops.mq.init_sched(q, new_e);
 		} else
@@ -997,6 +1001,12 @@  static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 
 	/* done, kill the old one and finish */
 	if (old) {
+		/*
+		 * If we switched to another I/O scheduler, then we shouldn't
+		 * free sched_tags.
+		 */
+		if (old->uses_mq && !new_e)
+			blk_mq_sched_teardown(q);
 		elevator_exit(old);
 		if (!q->mq_ops)
 			blk_queue_bypass_end(q);
@@ -1015,7 +1025,7 @@  static int elevator_switch(struct request_queue *q, struct elevator_type *new_e)
 	return 0;
 
 fail_register:
-	if (q->mq_ops)
+	if (q->mq_ops && !old)
 		blk_mq_sched_teardown(q);
 	elevator_exit(q->elevator);
 fail_init: