Message ID | 1491417192.2787.11.camel@sandisk.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 05, 2017 at 06:33:14PM +0000, Bart Van Assche wrote: > On Wed, 2017-04-05 at 11:28 -0700, Omar Sandoval wrote: > > From: Omar Sandoval <osandov@fb.com> > > > > While dispatching requests, if we fail to get a driver tag, we mark the > > hardware queue as waiting for a tag and put the requests on a > > hctx->dispatch list to be run later when a driver tag is freed. However, > > blk_mq_dispatch_rq_list() may dispatch requests from multiple hardware > > queues if using a single-queue scheduler with a multiqueue device. If > > blk_mq_get_driver_tag() fails, it doesn't update the hardware queue we > > are processing. This means we end up using the hardware queue of the > > previous request, which may or may not be the same as that of the > > current request. If it isn't, the wrong hardware queue will end up > > waiting for a tag, and the requests will be on the wrong dispatch list, > > leading to a hang. > > > > The fix is twofold: > > > > 1. Make sure we save which hardware queue we were trying to get a > > request for in blk_mq_get_driver_tag() regardless of whether it > > succeeds or not. > > 2. Make blk_mq_dispatch_rq_list() take a request_queue instead of a > > blk_mq_hw_queue to make it clear that it must handle multiple > > hardware queues, since I've already messed this up on a couple of > > occasions. > > > > This didn't appear in testing with nvme and mq-deadline because nvme has > > more driver tags than the default number of scheduler tags. However, > > with the blk_mq_update_nr_hw_queues() fix, it showed up with nbd. > > Would the patch below be a valid alternative? > > Thanks, > > Bart. Hi, Bart, This actually has the same bug as the original code, see below. > [PATCH] blk-mq: Simplify blk_mq_get_driver_tag() > > The blk_mq_get_driver_tag() callers either assume that *hctx is not > modified or that it points to a valid hctx pointer upon return if > tag allocation succeeded. Avoid this confusion by returning the hctx > pointer if and only if tag allocation succeeded and by only storing > the return value into hctx in those blk_mq_get_driver_tag() callers > for which the hctx pointer had not yet been computed before the > blk_mq_get_driver_tag() call. > > Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> > --- > block/blk-mq-sched.c | 4 +++- > block/blk-mq.c | 24 ++++++++++-------------- > block/blk-mq.h | 3 +-- > 3 files changed, 14 insertions(+), 17 deletions(-) > [snip] > static void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx, > @@ -985,7 +980,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list) > struct blk_mq_queue_data bd; > > rq = list_first_entry(list, struct request, queuelist); > - if (!blk_mq_get_driver_tag(rq, &hctx, false)) { > + if (!blk_mq_get_driver_tag(rq, false)) { Here, we want to know what hardware queue we attempted the tag allocation on, so this won't work. Thanks for taking a look!
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index a85d939ef450..bfb8bdb95a87 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -386,7 +386,9 @@ void blk_mq_sched_restart_a_queue(struct blk_mq_hw_ctx *hctx) static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx, struct request *rq, bool can_block) { - if (blk_mq_get_driver_tag(rq, &hctx, can_block)) { + WARN_ON_ONCE(!hctx); + + if (blk_mq_get_driver_tag(rq, can_block)) { blk_insert_flush(rq); blk_mq_run_hw_queue(hctx, true); } else diff --git a/block/blk-mq.c b/block/blk-mq.c index 45e7f597cea3..c8e0c02dc8ca 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -857,8 +857,7 @@ static inline unsigned int queued_to_index(unsigned int queued) return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1); } -bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx, - bool wait) +struct blk_mq_hw_ctx *blk_mq_get_driver_tag(struct request *rq, bool wait) { struct blk_mq_alloc_data data = { .q = rq->q, @@ -866,12 +865,8 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx, .flags = wait ? 0 : BLK_MQ_REQ_NOWAIT, }; - if (rq->tag != -1) { -done: - if (hctx) - *hctx = data.hctx; - return true; - } + if (rq->tag != -1) + return data.hctx; if (blk_mq_tag_is_reserved(data.hctx->sched_tags, rq->internal_tag)) data.flags |= BLK_MQ_REQ_RESERVED; @@ -883,10 +878,10 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx, atomic_inc(&data.hctx->nr_active); } data.hctx->tags->rqs[rq->tag] = rq; - goto done; + return data.hctx; } - return false; + return NULL; } static void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx, @@ -985,7 +980,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list) struct blk_mq_queue_data bd; rq = list_first_entry(list, struct request, queuelist); - if (!blk_mq_get_driver_tag(rq, &hctx, false)) { + if (!blk_mq_get_driver_tag(rq, false)) { if (!queued && reorder_tags_to_front(list)) continue; @@ -999,7 +994,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list) * window between the allocation failure and * adding the hardware queue to the wait queue. */ - if (!blk_mq_get_driver_tag(rq, &hctx, false)) + if (!blk_mq_get_driver_tag(rq, false)) break; } else { break; @@ -1020,7 +1015,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list) struct request *nxt; nxt = list_first_entry(list, struct request, queuelist); - bd.last = !blk_mq_get_driver_tag(nxt, NULL, false); + bd.last = !blk_mq_get_driver_tag(nxt, false); } ret = q->mq_ops->queue_rq(hctx, &bd); @@ -1435,7 +1430,8 @@ static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie, if (q->elevator) goto insert; - if (!blk_mq_get_driver_tag(rq, &hctx, false)) + hctx = blk_mq_get_driver_tag(rq, false); + if (!hctx) goto insert; new_cookie = request_to_qc_t(hctx, rq); diff --git a/block/blk-mq.h b/block/blk-mq.h index 8d49c06fc520..b1917fbe955c 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -33,8 +33,7 @@ void blk_mq_wake_waiters(struct request_queue *q); bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *, struct list_head *); void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list); bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx); -bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx, - bool wait); +struct blk_mq_hw_ctx *blk_mq_get_driver_tag(struct request *rq, bool wait); /* * Internal helpers for allocating/freeing the request map