diff mbox series

[2/2] blk: Fix lock inversion between ioc lock and bfqd lock

Message ID 20210520223353.11561-3-jack@suse.cz (mailing list archive)
State New, archived
Headers show
Series block: Fix deadlock when merging requests with BFQ | expand

Commit Message

Jan Kara May 20, 2021, 10:33 p.m. UTC
Lockdep complains about lock inversion between ioc->lock and bfqd->lock:

bfqd -> ioc:
 put_io_context+0x33/0x90 -> ioc->lock grabbed
 blk_mq_free_request+0x51/0x140
 blk_put_request+0xe/0x10
 blk_attempt_req_merge+0x1d/0x30
 elv_attempt_insert_merge+0x56/0xa0
 blk_mq_sched_try_insert_merge+0x4b/0x60
 bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
 blk_mq_sched_insert_requests+0xd6/0x2b0
 blk_mq_flush_plug_list+0x154/0x280
 blk_finish_plug+0x40/0x60
 ext4_writepages+0x696/0x1320
 do_writepages+0x1c/0x80
 __filemap_fdatawrite_range+0xd7/0x120
 sync_file_range+0xac/0xf0

ioc->bfqd:
 bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
 put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
 exit_io_context+0x48/0x50
 do_exit+0x7e9/0xdd0
 do_group_exit+0x54/0xc0

To avoid this inversion we change blk_mq_sched_try_insert_merge() to not
free the merged request but rather leave that upto the caller similarly
to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure
to free the request after dropping bfqd->lock. As a nice consequence,
this also makes locking rules in bfq_finish_requeue_request() more
consistent.

Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/bfq-iosched.c | 20 +++++++-------------
 block/blk-merge.c   | 19 ++++++++-----------
 block/blk.h         |  2 +-
 block/mq-deadline.c |  4 +++-
 4 files changed, 19 insertions(+), 26 deletions(-)

Comments

Ming Lei May 21, 2021, 12:57 a.m. UTC | #1
On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote:
> Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
> 
> bfqd -> ioc:
>  put_io_context+0x33/0x90 -> ioc->lock grabbed
>  blk_mq_free_request+0x51/0x140
>  blk_put_request+0xe/0x10
>  blk_attempt_req_merge+0x1d/0x30
>  elv_attempt_insert_merge+0x56/0xa0
>  blk_mq_sched_try_insert_merge+0x4b/0x60
>  bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed

We could move blk_put_request() into scheduler code, then the lock
inversion is avoided. So far only mq-deadline and bfq calls into
blk_mq_sched_try_insert_merge(), and this change should be small.


Thanks,
Ming
Khazhy Kumykov May 21, 2021, 3:29 a.m. UTC | #2
On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote:
> > Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
> >
> > bfqd -> ioc:
> >  put_io_context+0x33/0x90 -> ioc->lock grabbed
> >  blk_mq_free_request+0x51/0x140
> >  blk_put_request+0xe/0x10
> >  blk_attempt_req_merge+0x1d/0x30
> >  elv_attempt_insert_merge+0x56/0xa0
> >  blk_mq_sched_try_insert_merge+0x4b/0x60
> >  bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
>
> We could move blk_put_request() into scheduler code, then the lock
> inversion is avoided. So far only mq-deadline and bfq calls into
> blk_mq_sched_try_insert_merge(), and this change should be small.

We'd potentially be putting multiple requests if we keep the recursive merge.

Could we move backmerge loop to the schedulers, perhaps?

>
>
> Thanks,
> Ming
>
Ming Lei May 21, 2021, 6:54 a.m. UTC | #3
On Thu, May 20, 2021 at 08:29:49PM -0700, Khazhy Kumykov wrote:
> On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote:
> > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
> > >
> > > bfqd -> ioc:
> > >  put_io_context+0x33/0x90 -> ioc->lock grabbed
> > >  blk_mq_free_request+0x51/0x140
> > >  blk_put_request+0xe/0x10
> > >  blk_attempt_req_merge+0x1d/0x30
> > >  elv_attempt_insert_merge+0x56/0xa0
> > >  blk_mq_sched_try_insert_merge+0x4b/0x60
> > >  bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
> >
> > We could move blk_put_request() into scheduler code, then the lock
> > inversion is avoided. So far only mq-deadline and bfq calls into
> > blk_mq_sched_try_insert_merge(), and this change should be small.
> 
> We'd potentially be putting multiple requests if we keep the recursive merge.

Oh, we still can pass a list to hold all requests to be freed, then free
them all outside in scheduler code.


Thanks, 
Ming
Jan Kara May 21, 2021, 12:05 p.m. UTC | #4
On Fri 21-05-21 14:54:09, Ming Lei wrote:
> On Thu, May 20, 2021 at 08:29:49PM -0700, Khazhy Kumykov wrote:
> > On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote:
> > > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
> > > >
> > > > bfqd -> ioc:
> > > >  put_io_context+0x33/0x90 -> ioc->lock grabbed
> > > >  blk_mq_free_request+0x51/0x140
> > > >  blk_put_request+0xe/0x10
> > > >  blk_attempt_req_merge+0x1d/0x30
> > > >  elv_attempt_insert_merge+0x56/0xa0
> > > >  blk_mq_sched_try_insert_merge+0x4b/0x60
> > > >  bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
> > >
> > > We could move blk_put_request() into scheduler code, then the lock
> > > inversion is avoided. So far only mq-deadline and bfq calls into
> > > blk_mq_sched_try_insert_merge(), and this change should be small.
> > 
> > We'd potentially be putting multiple requests if we keep the recursive merge.
> 
> Oh, we still can pass a list to hold all requests to be freed, then free
> them all outside in scheduler code.

If we cannot really get rid of the recursive merge (not yet convinced),
this is also an option I've considered. I was afraid what can we use in
struct request to attach request to a list but it seems .merged_requests
handlers remove the request from the queuelist already so we should be fine
using that.

								Honza
Ming Lei May 21, 2021, 1:36 p.m. UTC | #5
On Fri, May 21, 2021 at 02:05:51PM +0200, Jan Kara wrote:
> On Fri 21-05-21 14:54:09, Ming Lei wrote:
> > On Thu, May 20, 2021 at 08:29:49PM -0700, Khazhy Kumykov wrote:
> > > On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote:
> > > > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
> > > > >
> > > > > bfqd -> ioc:
> > > > >  put_io_context+0x33/0x90 -> ioc->lock grabbed
> > > > >  blk_mq_free_request+0x51/0x140
> > > > >  blk_put_request+0xe/0x10
> > > > >  blk_attempt_req_merge+0x1d/0x30
> > > > >  elv_attempt_insert_merge+0x56/0xa0
> > > > >  blk_mq_sched_try_insert_merge+0x4b/0x60
> > > > >  bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
> > > >
> > > > We could move blk_put_request() into scheduler code, then the lock
> > > > inversion is avoided. So far only mq-deadline and bfq calls into
> > > > blk_mq_sched_try_insert_merge(), and this change should be small.
> > > 
> > > We'd potentially be putting multiple requests if we keep the recursive merge.
> > 
> > Oh, we still can pass a list to hold all requests to be freed, then free
> > them all outside in scheduler code.
> 
> If we cannot really get rid of the recursive merge (not yet convinced),
> this is also an option I've considered. I was afraid what can we use in
> struct request to attach request to a list but it seems .merged_requests
> handlers remove the request from the queuelist already so we should be fine
> using that.

The request has been removed from scheduler queue, and safe to free,
so it is safe to be held in one temporary list.

Thanks,
Ming
Jan Kara May 21, 2021, 1:47 p.m. UTC | #6
On Fri 21-05-21 21:36:05, Ming Lei wrote:
> On Fri, May 21, 2021 at 02:05:51PM +0200, Jan Kara wrote:
> > On Fri 21-05-21 14:54:09, Ming Lei wrote:
> > > On Thu, May 20, 2021 at 08:29:49PM -0700, Khazhy Kumykov wrote:
> > > > On Thu, May 20, 2021 at 5:57 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > > >
> > > > > On Fri, May 21, 2021 at 12:33:53AM +0200, Jan Kara wrote:
> > > > > > Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
> > > > > >
> > > > > > bfqd -> ioc:
> > > > > >  put_io_context+0x33/0x90 -> ioc->lock grabbed
> > > > > >  blk_mq_free_request+0x51/0x140
> > > > > >  blk_put_request+0xe/0x10
> > > > > >  blk_attempt_req_merge+0x1d/0x30
> > > > > >  elv_attempt_insert_merge+0x56/0xa0
> > > > > >  blk_mq_sched_try_insert_merge+0x4b/0x60
> > > > > >  bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
> > > > >
> > > > > We could move blk_put_request() into scheduler code, then the lock
> > > > > inversion is avoided. So far only mq-deadline and bfq calls into
> > > > > blk_mq_sched_try_insert_merge(), and this change should be small.
> > > > 
> > > > We'd potentially be putting multiple requests if we keep the recursive merge.
> > > 
> > > Oh, we still can pass a list to hold all requests to be freed, then free
> > > them all outside in scheduler code.
> > 
> > If we cannot really get rid of the recursive merge (not yet convinced),
> > this is also an option I've considered. I was afraid what can we use in
> > struct request to attach request to a list but it seems .merged_requests
> > handlers remove the request from the queuelist already so we should be fine
> > using that.
> 
> The request has been removed from scheduler queue, and safe to free,
> so it is safe to be held in one temporary list.

Not quite, there's still ->finish_request hook that will be called from
blk_mq_free_request() on the request and e.g. BFQ performs quite a lot of
cleanup there. But yes, at least queuelist seems to be available for reuse
here.

								Honza
diff mbox series

Patch

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index acd1f881273e..4afdf0b93124 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2317,9 +2317,9 @@  static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
 
 	ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
 
+	spin_unlock_irq(&bfqd->lock);
 	if (free)
 		blk_mq_free_request(free);
-	spin_unlock_irq(&bfqd->lock);
 
 	return ret;
 }
@@ -5933,6 +5933,7 @@  static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	spin_lock_irq(&bfqd->lock);
 	if (blk_mq_sched_try_insert_merge(q, rq)) {
 		spin_unlock_irq(&bfqd->lock);
+		blk_put_request(rq);
 		return;
 	}
 
@@ -6376,6 +6377,7 @@  static void bfq_finish_requeue_request(struct request *rq)
 {
 	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 	struct bfq_data *bfqd;
+	unsigned long flags;
 
 	/*
 	 * rq either is not associated with any icq, or is an already
@@ -6393,18 +6395,12 @@  static void bfq_finish_requeue_request(struct request *rq)
 					     rq->io_start_time_ns,
 					     rq->cmd_flags);
 
+	spin_lock_irqsave(&bfqd->lock, flags);
 	if (likely(rq->rq_flags & RQF_STARTED)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&bfqd->lock, flags);
-
 		if (rq == bfqd->waited_rq)
 			bfq_update_inject_limit(bfqd, bfqq);
 
 		bfq_completed_request(bfqq, bfqd);
-		bfq_finish_requeue_request_body(bfqq);
-
-		spin_unlock_irqrestore(&bfqd->lock, flags);
 	} else {
 		/*
 		 * Request rq may be still/already in the scheduler,
@@ -6414,18 +6410,16 @@  static void bfq_finish_requeue_request(struct request *rq)
 		 * inconsistencies in the time interval from the end
 		 * of this function to the start of the deferred work.
 		 * This situation seems to occur only in process
-		 * context, as a consequence of a merge. In the
-		 * current version of the code, this implies that the
-		 * lock is held.
+		 * context, as a consequence of a merge.
 		 */
-
 		if (!RB_EMPTY_NODE(&rq->rb_node)) {
 			bfq_remove_request(rq->q, rq);
 			bfqg_stats_update_io_remove(bfqq_group(bfqq),
 						    rq->cmd_flags);
 		}
-		bfq_finish_requeue_request_body(bfqq);
 	}
+	bfq_finish_requeue_request_body(bfqq);
+	spin_unlock_irqrestore(&bfqd->lock, flags);
 
 	/*
 	 * Reset private fields. In case of a requeue, this allows
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 4d97fb6dd226..1398b52a24b4 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -846,18 +846,15 @@  static struct request *attempt_front_merge(struct request_queue *q,
 	return NULL;
 }
 
-int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
-			  struct request *next)
+/*
+ * Try to merge 'next' into 'rq'. Return true if the merge happened, false
+ * otherwise. The caller is responsible for freeing 'next' if the merge
+ * happened.
+ */
+bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
+			   struct request *next)
 {
-	struct request *free;
-
-	free = attempt_merge(q, rq, next);
-	if (free) {
-		blk_put_request(free);
-		return 1;
-	}
-
-	return 0;
+	return attempt_merge(q, rq, next);
 }
 
 bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
diff --git a/block/blk.h b/block/blk.h
index 8b3591aee0a5..99ef4f7e7a70 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -225,7 +225,7 @@  ssize_t part_timeout_store(struct device *, struct device_attribute *,
 void __blk_queue_split(struct bio **bio, unsigned int *nr_segs);
 int ll_back_merge_fn(struct request *req, struct bio *bio,
 		unsigned int nr_segs);
-int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
+bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
 				struct request *next);
 unsigned int blk_recalc_rq_segments(struct request *rq);
 void blk_rq_set_mixed_merge(struct request *rq);
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 8eea2cbf2bf4..64dd78005ae6 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -494,8 +494,10 @@  static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	 */
 	blk_req_zone_write_unlock(rq);
 
-	if (blk_mq_sched_try_insert_merge(q, rq))
+	if (blk_mq_sched_try_insert_merge(q, rq)) {
+		blk_put_request(rq);
 		return;
+	}
 
 	trace_block_rq_insert(rq);