Message ID | 20180723155038.22062-1-bart.vanassche@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: Avoid that a request queue stalls when restarting a shared hctx | expand |
On 07/23/2018 11:50 PM, Bart Van Assche wrote: > The patch below fixes queue stalling when shared hctx marked for restart > (BLK_MQ_S_SCHED_RESTART bit) but q->shared_hctx_restart stays zero. The > root cause is that hctxs are shared between queues, but 'shared_hctx_restart' The blk_mq_hw_ctx structure is also per request_queue Please refer to blk_mq_init_allocated_queue -> blk_mq_realloc_hw_ctxs > belongs to the particular queue, which in fact may not need to be restarted, > thus we return from blk_mq_sched_restart() and leave shared hctx of another > queue never restarted. > > The fix is to make shared_hctx_restart counter belong not to the queue, but > to tags, thereby counter will reflect real number of shared hctx needed to > be restarted. > > During tests 1 hctx (set->nr_hw_queues) was used and all stalled requests > were noticed in dd->fifo_list of mq-deadline scheduler. > > Seeming possible sequence of events: > > 1. Request A of queue A is inserted into dd->fifo_list of the scheduler. > > 2. Request B of queue A bypasses scheduler and goes directly to > hctx->dispatch. > > 3. Request C of queue B is inserted. > > 4. blk_mq_sched_dispatch_requests() is invoked, since hctx->dispatch is not > empty (request B is in the list) hctx is only marked for for next restart > and request A is left in a list (see comment "So it's best to leave them > there for as long as we can. Mark the hw queue as needing a restart in > that case." in blk-mq-sched.c) > > 5. Eventually request B is completed/freed and blk_mq_sched_restart() is > called, but by chance hctx from queue B is chosen for restart and request C > gets a chance to be dispatched. Request C is just inserted into queue B. If there is no mark restart there, it will not be chosen. blk_mq_sched_restart_hctx if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) return false; If blk_mq_sched_restart choose queue B, one of its hctx must have SCHED_RESTART flag, and q->shared_hctx_restart must not be zero. > > 6. Eventually request C is completed/freed and blk_mq_sched_restart() is > called, but shared_hctx_restart for queue B is zero and we return without > attempt to restart hctx from queue A, thus request A is stuck forever. > > But stalling queue is not the only one problem with blk_mq_sched_restart(). > My tests show that those loops thru all queues and hctxs can be very costly, > even with shared_hctx_restart counter, which aims to fix performance issue. Currently, SCHED_RESTART is always set when there is reqs on hctx->dispatch list in blk_mq_sched_dispatch_requests. And no driver tag case is the main reason for hctx->dispatch is not empty when there is io scheduler. Therefore, most of time, blk_mq_sched_restart will be invoked further for no driver tag case. For non-share-tag case, it will wakeup the hctx. But for shared-tag case, it is unnecessary, because the sbitmap_queue wakeup hook will work and hctx_may_queue will avoid starvation of other ones. Therefore, the costly loop through the queues and hctxs is unnecessary most of time. Thanks Jianchao
On Tue, 2018-07-24 at 16:10 +0800, jianchao.wang wrote: > The blk_mq_hw_ctx structure is also per request_queue > Please refer to blk_mq_init_allocated_queue -> blk_mq_realloc_hw_ctxs Hello Jianchao, I think that Ming Lei's patch "blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()" makes Roman's patch superfluous. When I reposted Roman's patch I had not yet noticed that Ming's patch is already in Jens' tree. See also https://www.mail-archive.com/linux-block@vger.kernel.org/msg22956.html. Bart.
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 56c493c6cd90..d863b1b32b07 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -60,10 +60,10 @@ static void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx) return; if (hctx->flags & BLK_MQ_F_TAG_SHARED) { - struct request_queue *q = hctx->queue; + struct blk_mq_tags *tags = hctx->tags; if (!test_and_set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) - atomic_inc(&q->shared_hctx_restart); + atomic_inc(&tags->shared_hctx_restart); } else set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); } @@ -74,10 +74,8 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx) return false; if (hctx->flags & BLK_MQ_F_TAG_SHARED) { - struct request_queue *q = hctx->queue; - if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) - atomic_dec(&q->shared_hctx_restart); + atomic_dec(&hctx->tags->shared_hctx_restart); } else clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); @@ -415,7 +413,7 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx) * If this is 0, then we know that no hardware queues * have RESTART marked. We're done. */ - if (!atomic_read(&queue->shared_hctx_restart)) + if (!atomic_read(&tags->shared_hctx_restart)) return; rcu_read_lock(); diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 09b2ee6694fb..82cd73631adc 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -379,6 +379,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, tags->nr_tags = total_tags; tags->nr_reserved_tags = reserved_tags; + atomic_set(&tags->shared_hctx_restart, 0); return blk_mq_init_bitmap_tags(tags, node, alloc_policy); } diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 61deab0b5a5a..477a9d67fb3d 100644 --- a/block/blk-mq-tag.h +++ b/block/blk-mq-tag.h @@ -12,6 +12,7 @@ struct blk_mq_tags { unsigned int nr_reserved_tags; atomic_t active_queues; + atomic_t shared_hctx_restart; struct sbitmap_queue bitmap_tags; struct sbitmap_queue breserved_tags; diff --git a/block/blk-mq.c b/block/blk-mq.c index d394cdd8d8c6..a0fdf80db8fd 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2335,11 +2335,11 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared) queue_for_each_hw_ctx(q, hctx, i) { if (shared) { if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) - atomic_inc(&q->shared_hctx_restart); + atomic_inc(&hctx->tags->shared_hctx_restart); hctx->flags |= BLK_MQ_F_TAG_SHARED; } else { if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) - atomic_dec(&q->shared_hctx_restart); + atomic_dec(&hctx->tags->shared_hctx_restart); hctx->flags &= ~BLK_MQ_F_TAG_SHARED; } } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 79226ca8f80f..62b20da653ca 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -442,8 +442,6 @@ struct request_queue { int nr_rqs[2]; /* # allocated [a]sync rqs */ int nr_rqs_elvpriv; /* # allocated rqs w/ elvpriv */ - atomic_t shared_hctx_restart; - struct blk_queue_stats *stats; struct rq_wb *rq_wb;