Message ID | 20210827124100.98112-2-Niklas.Cassel@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | improve io scheduler callback triggering | expand |
On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote: > From: Niklas Cassel <niklas.cassel@wdc.com> > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets > RQF_ELVPRIV. > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be > set for the request in blk_mq_submit_bio(), regardless if the request > was submitted to a scheduler, or bypassed the scheduler. > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set, > if it is, the ops.finish_request callback will be called. > > The problem with this is that the finish_request scheduler callback > will be called for requests that bypassed the scheduler. > > Fix this by calling the scheduler ops.prepare_request callback, and > set the RQF_ELVPRIV flag only immediately before calling the insert > callback. One request could be inserted more than one times, such as requeue, however __blk_mq_alloc_request() is just run once, so is it fine to call ->prepare_request more than one time for same request? Or I am wondering why not call ->prepare_request when the following check is true? if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) && !blk_op_is_passthrough(data->cmd_flags)) e->type->ops.prepare_request() Thanks, Ming
On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote: > On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote: > > From: Niklas Cassel <niklas.cassel@wdc.com> > > > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets > > RQF_ELVPRIV. > > > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be > > set for the request in blk_mq_submit_bio(), regardless if the request > > was submitted to a scheduler, or bypassed the scheduler. > > > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set, > > if it is, the ops.finish_request callback will be called. > > > > The problem with this is that the finish_request scheduler callback > > will be called for requests that bypassed the scheduler. > > > > Fix this by calling the scheduler ops.prepare_request callback, and > > set the RQF_ELVPRIV flag only immediately before calling the insert > > callback. > > One request could be inserted more than one times, such as requeue, > however __blk_mq_alloc_request() is just run once, so is it fine to > call ->prepare_request more than one time for same request? Calling ->prepare_request multiple times is fine. All the different I/O schedulers (BFQ, mq-deadline, kyber) simply use .prepare_request to clear/set elv->priv to a fixed value. > > Or I am wondering why not call ->prepare_request when the following > check is true? > > if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) && > !blk_op_is_passthrough(data->cmd_flags)) > e->type->ops.prepare_request() That might work, and might be a nicer solution indeed. If a request got plugged, it will be inserted to the scheduler through blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests() which will insert them unconditionally. In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio() would have inserted directly.) If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if blk_mq_sched_bypass_insert() returns false: blk_mq_sched_bypass_insert() is defined as: if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq)) return true; Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have inserted directly.) So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the ->prepare_request condition in blk_mq_rq_ctx_init() like you suggested, but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well? This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ flag will only be set for a request which op_is_flush() returned true? (If so, then only adding && !blk_op_is_passthrough(data->cmd_flags) should be fine.) Kind regards, Niklas
On Mon, Aug 30, 2021 at 09:48:06AM +0000, Niklas Cassel wrote: > On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote: > > On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote: > > > From: Niklas Cassel <niklas.cassel@wdc.com> > > > > > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets > > > RQF_ELVPRIV. > > > > > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be > > > set for the request in blk_mq_submit_bio(), regardless if the request > > > was submitted to a scheduler, or bypassed the scheduler. > > > > > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set, > > > if it is, the ops.finish_request callback will be called. > > > > > > The problem with this is that the finish_request scheduler callback > > > will be called for requests that bypassed the scheduler. > > > > > > Fix this by calling the scheduler ops.prepare_request callback, and > > > set the RQF_ELVPRIV flag only immediately before calling the insert > > > callback. > > > > One request could be inserted more than one times, such as requeue, > > however __blk_mq_alloc_request() is just run once, so is it fine to > > call ->prepare_request more than one time for same request? > > Calling ->prepare_request multiple times is fine. > All the different I/O schedulers (BFQ, mq-deadline, kyber) > simply use .prepare_request to clear/set elv->priv to a fixed value. > > > > > Or I am wondering why not call ->prepare_request when the following > > check is true? > > > > if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) && > > !blk_op_is_passthrough(data->cmd_flags)) > > e->type->ops.prepare_request() > > > That might work, and might be a nicer solution indeed. > > If a request got plugged, it will be inserted to the scheduler through > blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests() > which will insert them unconditionally. > In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio() > would have inserted directly.) > > > If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if > blk_mq_sched_bypass_insert() returns false: > > blk_mq_sched_bypass_insert() is defined as: > > if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq)) > return true; > Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have > inserted directly.) > > > So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the > ->prepare_request condition in blk_mq_rq_ctx_init() like you suggested, > but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't > we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well? > > This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ > flag will only be set for a request which op_is_flush() returned true? > > (If so, then only adding && !blk_op_is_passthrough(data->cmd_flags) should > be fine.) BTW, what I meant is the following change, is it fine? diff --git a/block/blk-mq.c b/block/blk-mq.c index 0a33d16a7298..f98f8cc05644 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -327,20 +327,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++; refcount_set(&rq->ref, 1); - - if (!op_is_flush(data->cmd_flags)) { - struct elevator_queue *e = data->q->elevator; - - rq->elv.icq = NULL; - if (e && e->type->ops.prepare_request) { - if (e->type->icq_cache) - blk_mq_sched_assign_ioc(rq); - - e->type->ops.prepare_request(rq); - rq->rq_flags |= RQF_ELVPRIV; - } - } - data->hctx->queued++; return rq; } @@ -359,17 +345,25 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data) if (data->cmd_flags & REQ_NOWAIT) data->flags |= BLK_MQ_REQ_NOWAIT; - if (e) { + if (e && !op_is_flush(data->cmd_flags) && + !blk_op_is_passthrough(data->cmd_flags)) { /* * Flush/passthrough requests are special and go directly to the * dispatch list. Don't include reserved tags in the * limiting, as it isn't useful. */ - if (!op_is_flush(data->cmd_flags) && - !blk_op_is_passthrough(data->cmd_flags) && - e->type->ops.limit_depth && - !(data->flags & BLK_MQ_REQ_RESERVED)) + if (e->type->ops.limit_depth && + !(data->flags & BLK_MQ_REQ_RESERVED)) e->type->ops.limit_depth(data->cmd_flags, data); + + rq->elv.icq = NULL; + if (e->type->ops.prepare_request) { + if (e->type->icq_cache) + blk_mq_sched_assign_ioc(rq); + + e->type->ops.prepare_request(rq); + rq->rq_flags |= RQF_ELVPRIV; + } } retry: Thanks, Ming
On Mon, Aug 30, 2021 at 06:11:12PM +0800, Ming Lei wrote: > On Mon, Aug 30, 2021 at 09:48:06AM +0000, Niklas Cassel wrote: > > On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote: > > > On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote: > > > > From: Niklas Cassel <niklas.cassel@wdc.com> > > > > > > > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets > > > > RQF_ELVPRIV. > > > > > > > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be > > > > set for the request in blk_mq_submit_bio(), regardless if the request > > > > was submitted to a scheduler, or bypassed the scheduler. > > > > > > > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set, > > > > if it is, the ops.finish_request callback will be called. > > > > > > > > The problem with this is that the finish_request scheduler callback > > > > will be called for requests that bypassed the scheduler. > > > > > > > > Fix this by calling the scheduler ops.prepare_request callback, and > > > > set the RQF_ELVPRIV flag only immediately before calling the insert > > > > callback. > > > > > > One request could be inserted more than one times, such as requeue, > > > however __blk_mq_alloc_request() is just run once, so is it fine to > > > call ->prepare_request more than one time for same request? > > > > Calling ->prepare_request multiple times is fine. > > All the different I/O schedulers (BFQ, mq-deadline, kyber) > > simply use .prepare_request to clear/set elv->priv to a fixed value. > > > > > > > > Or I am wondering why not call ->prepare_request when the following > > > check is true? > > > > > > if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) && > > > !blk_op_is_passthrough(data->cmd_flags)) > > > e->type->ops.prepare_request() > > > > > > That might work, and might be a nicer solution indeed. > > > > If a request got plugged, it will be inserted to the scheduler through > > blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests() > > which will insert them unconditionally. > > In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio() > > would have inserted directly.) > > > > > > If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if > > blk_mq_sched_bypass_insert() returns false: > > > > blk_mq_sched_bypass_insert() is defined as: > > > > if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq)) > > return true; > > Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have > > inserted directly.) > > > > > > So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the > > ->prepare_request condition in blk_mq_rq_ctx_init() like you suggested, > > but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't > > we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well? > > > > This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ > > flag will only be set for a request which op_is_flush() returned true? > > > > (If so, then only adding && !blk_op_is_passthrough(data->cmd_flags) should > > be fine.) > > BTW, what I meant is the following change, is it fine? > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 0a33d16a7298..f98f8cc05644 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -327,20 +327,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, > > data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++; > refcount_set(&rq->ref, 1); > - > - if (!op_is_flush(data->cmd_flags)) { > - struct elevator_queue *e = data->q->elevator; > - > - rq->elv.icq = NULL; > - if (e && e->type->ops.prepare_request) { > - if (e->type->icq_cache) > - blk_mq_sched_assign_ioc(rq); > - > - e->type->ops.prepare_request(rq); > - rq->rq_flags |= RQF_ELVPRIV; > - } > - } > - > data->hctx->queued++; > return rq; > } > @@ -359,17 +345,25 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data) > if (data->cmd_flags & REQ_NOWAIT) > data->flags |= BLK_MQ_REQ_NOWAIT; > > - if (e) { > + if (e && !op_is_flush(data->cmd_flags) && > + !blk_op_is_passthrough(data->cmd_flags)) { > /* > * Flush/passthrough requests are special and go directly to the > * dispatch list. Don't include reserved tags in the > * limiting, as it isn't useful. > */ > - if (!op_is_flush(data->cmd_flags) && > - !blk_op_is_passthrough(data->cmd_flags) && > - e->type->ops.limit_depth && > - !(data->flags & BLK_MQ_REQ_RESERVED)) > + if (e->type->ops.limit_depth && > + !(data->flags & BLK_MQ_REQ_RESERVED)) > e->type->ops.limit_depth(data->cmd_flags, data); > + > + rq->elv.icq = NULL; > + if (e->type->ops.prepare_request) { > + if (e->type->icq_cache) > + blk_mq_sched_assign_ioc(rq); > + > + e->type->ops.prepare_request(rq); > + rq->rq_flags |= RQF_ELVPRIV; > + } > } > > retry: > Hello Ming, Sorry for the delayed reply. Your patch does not compile, because rq is not defined in this function. Another problem seems to be that in __blk_mq_alloc_request(), at the end of the function, calls blk_mq_rq_ctx_init(), which will unconditionally set rq->rq_flags = 0; The simple patch: --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -328,7 +328,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++; refcount_set(&rq->ref, 1); - if (!op_is_flush(data->cmd_flags)) { + if (!op_is_flush(data->cmd_flags) && + !blk_op_is_passthrough(data->cmd_flags)) { struct elevator_queue *e = data->q->elevator; rq->elv.icq = NULL; Does appear to solve the problem. My only worry was RQF_FLUSH_SEQ flag, but as far as I can tell, it is only ever set for a request that which op_is_flush() returned true. Kind regards, Niklas
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 0f006cabfd91..eacacb7088c1 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -466,6 +466,14 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head, if (e) { LIST_HEAD(list); + rq->elv.icq = NULL; + if (e && e->type->ops.prepare_request) { + if (e->type->icq_cache) + blk_mq_sched_assign_ioc(rq); + + e->type->ops.prepare_request(rq); + rq->rq_flags |= RQF_ELVPRIV; + } list_add(&rq->queuelist, &list); e->type->ops.insert_requests(hctx, &list, at_head); } else { @@ -495,6 +503,18 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx, e = hctx->queue->elevator; if (e) { + struct request *rq; + + list_for_each_entry(rq, list, queuelist) { + rq->elv.icq = NULL; + if (e && e->type->ops.prepare_request) { + if (e->type->icq_cache) + blk_mq_sched_assign_ioc(rq); + + e->type->ops.prepare_request(rq); + rq->rq_flags |= RQF_ELVPRIV; + } + } e->type->ops.insert_requests(hctx, list, false); } else { /* diff --git a/block/blk-mq.c b/block/blk-mq.c index 9d4fdc2be88a..3527dd9fd10e 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -328,19 +328,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++; refcount_set(&rq->ref, 1); - if (!op_is_flush(data->cmd_flags)) { - struct elevator_queue *e = data->q->elevator; - - rq->elv.icq = NULL; - if (e && e->type->ops.prepare_request) { - if (e->type->icq_cache) - blk_mq_sched_assign_ioc(rq); - - e->type->ops.prepare_request(rq); - rq->rq_flags |= RQF_ELVPRIV; - } - } - data->hctx->queued++; return rq; } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2e12320cb121..a5047c7e9448 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -81,7 +81,8 @@ typedef __u32 __bitwise req_flags_t; #define RQF_FAILED ((__force req_flags_t)(1 << 10)) /* don't warn about errors */ #define RQF_QUIET ((__force req_flags_t)(1 << 11)) -/* elevator private data attached */ +/* The request has been inserted to an elevator, and thus has private + data attached */ #define RQF_ELVPRIV ((__force req_flags_t)(1 << 12)) /* account into disk and partition IO statistics */ #define RQF_IO_STAT ((__force req_flags_t)(1 << 13))