Message ID | 20230813133643.3006943-1-chengming.zhou@linux.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: release scheduler resource when request complete | expand |
On 8/13/23 7:36 AM, chengming.zhou@linux.dev wrote: > From: Chengming Zhou <zhouchengming@bytedance.com> > > Chuck reported [1] a IO hang problem on NFS exports that reside on SATA > devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal > submission path for post-flush requests"). > > We analysed the IO hang problem, found there are two postflush requests > are waiting for each other. > > The first postflush request completed the REQ_FSEQ_DATA sequence, so go to > the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but > failed to blk_kick_flush() because of the second postflush request which > is inflight waiting in scheduler queue. > > The second postflush waiting in scheduler queue can't be dispatched because > the first postflush hasn't released scheduler resource even though it has > completed by itself. > > Fix it by releasing scheduler resource when the first postflush request > completed, so the second postflush can be dispatched and completed, then > make blk_kick_flush() succeed. Change looks good to me. But since we're in here: > diff --git a/block/blk-mq.c b/block/blk-mq.c > index f14b8669ac69..5b14f18f9670 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -682,6 +682,15 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > } > EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx); > > +static void blk_mq_finish_request(struct request *rq) > +{ > + struct request_queue *q = rq->q; > + > + if ((rq->rq_flags & RQF_USE_SCHED) && > + q->elevator->type->ops.finish_request) > + q->elevator->type->ops.finish_request(rq); > +} Any IO scheduler should set ->finish_request(), so this should just be: static void blk_mq_finish_request(struct request *rq) { struct request_queue *q = rq->q; if (rq->rq_flags & RQF_USE_SCHED) q->elevator->type->ops.finish_request(rq); } and would probably be a good idea to solidify that with a: if (WARN_ON_ONCE(!e->ops.finish_request)) return -EINVAL; at the top of elv_register() like we have for insert/dispatch as well. All 3 IO schedulers do set ->finish_request().
diff --git a/block/blk-mq.c b/block/blk-mq.c index f14b8669ac69..5b14f18f9670 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -682,6 +682,15 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, } EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx); +static void blk_mq_finish_request(struct request *rq) +{ + struct request_queue *q = rq->q; + + if ((rq->rq_flags & RQF_USE_SCHED) && + q->elevator->type->ops.finish_request) + q->elevator->type->ops.finish_request(rq); +} + static void __blk_mq_free_request(struct request *rq) { struct request_queue *q = rq->q; @@ -708,10 +717,6 @@ void blk_mq_free_request(struct request *rq) { struct request_queue *q = rq->q; - if ((rq->rq_flags & RQF_USE_SCHED) && - q->elevator->type->ops.finish_request) - q->elevator->type->ops.finish_request(rq); - if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq))) laptop_io_completion(q->disk->bdi); @@ -1021,6 +1026,8 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error) if (blk_mq_need_time_stamp(rq)) __blk_mq_end_request_acct(rq, ktime_get_ns()); + blk_mq_finish_request(rq); + if (rq->end_io) { rq_qos_done(rq->q, rq); if (rq->end_io(rq, error) == RQ_END_IO_FREE) @@ -1075,6 +1082,8 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob) if (iob->need_ts) __blk_mq_end_request_acct(rq, now); + blk_mq_finish_request(rq); + rq_qos_done(rq->q, rq); /*