Message ID | 20190604181736.903-2-bvanassche@acm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Simplify blk-mq implementation | expand |
On Tue, Jun 04, 2019 at 11:17:35AM -0700, Bart Van Assche wrote: > No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends > on preemption being disabled for its correctness. Since removing the CPU > preemption calls does not measurably affect performance, simplify the > blk-mq code by removing the blk_mq_put_ctx() function and also by not > disabling preemption in blk_mq_get_ctx(). I like the idea behinds this, but I think it makes some small issues we have in the current code even worse. As far as I can tell the idea behind this call was that we operate on the same blk_mq_ctx for the duration of the I/O submission. Now it should not matter which one, that is we don't care if we get preempted, but it should stay the same. To actually make this work properly we'll need to pass down the blk_mq_ctx into the I/O scheduler merge path, which dereference it. Note that I also have an outstanding series to pass down an additional parameter there (the bi_phys_segments removal) so we'll need to coordinate that.
On 6/8/19 1:19 AM, Christoph Hellwig wrote: > On Tue, Jun 04, 2019 at 11:17:35AM -0700, Bart Van Assche wrote: >> No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends >> on preemption being disabled for its correctness. Since removing the CPU >> preemption calls does not measurably affect performance, simplify the >> blk-mq code by removing the blk_mq_put_ctx() function and also by not >> disabling preemption in blk_mq_get_ctx(). > > I like the idea behinds this, but I think it makes some small issues > we have in the current code even worse. As far as I can tell the idea > behind this call was that we operate on the same blk_mq_ctx for the > duration of the I/O submission. Now it should not matter which one, > that is we don't care if we get preempted, but it should stay the same. Hi Christoph, Can you clarify this? Isn't the goal of the rq->mq_ctx = data->ctx assignment in blk_mq_rq_ctx_init() to ensure that the same blk_mq_ctx is used during I/O submission? As you know blk_mq_rq_ctx_init() is called immediately after a tag has been allocated. > To actually make this work properly we'll need to pass down the > blk_mq_ctx into the I/O scheduler merge path, which dereference it. > Note that I also have an outstanding series to pass down an additional > parameter there (the bi_phys_segments removal) so we'll need to > coordinate that. I can wait until Jens has queued your patch series. Thanks, Bart.
On Mon, Jun 10, 2019 at 03:00:39PM -0700, Bart Van Assche wrote: > On 6/8/19 1:19 AM, Christoph Hellwig wrote: >> On Tue, Jun 04, 2019 at 11:17:35AM -0700, Bart Van Assche wrote: >>> No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends >>> on preemption being disabled for its correctness. Since removing the CPU >>> preemption calls does not measurably affect performance, simplify the >>> blk-mq code by removing the blk_mq_put_ctx() function and also by not >>> disabling preemption in blk_mq_get_ctx(). >> >> I like the idea behinds this, but I think it makes some small issues >> we have in the current code even worse. As far as I can tell the idea >> behind this call was that we operate on the same blk_mq_ctx for the >> duration of the I/O submission. Now it should not matter which one, >> that is we don't care if we get preempted, but it should stay the same. > > Hi Christoph, > > Can you clarify this? Isn't the goal of the rq->mq_ctx = data->ctx > assignment in blk_mq_rq_ctx_init() to ensure that the same blk_mq_ctx is > used during I/O submission? Yes. But we still have additional blk_mq_get_ctx calls that I was concerned about. But looking deeper it seems like the additional ones are just used locally for I/O scheduler merge decisions, and we should be ok even if the context changes due to a preemption between the failed merge and the request allocation. That being said it would still be nice to pass the ctx from __blk_mq_sched_bio_merge to ->bio_merge instead of having to find it again in kyber_bio_merge, but that isn't urgent.
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 74c6bb871f7e..0daadb8d2274 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -326,10 +326,8 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio) bool ret = false; enum hctx_type type; - if (e && e->type->ops.bio_merge) { - blk_mq_put_ctx(ctx); + if (e && e->type->ops.bio_merge) return e->type->ops.bio_merge(hctx, bio); - } type = hctx->type; if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) && @@ -340,7 +338,6 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio) spin_unlock(&ctx->lock); } - blk_mq_put_ctx(ctx); return ret; } diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 7513c8eaabee..da19f0bc8876 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -113,7 +113,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) struct sbq_wait_state *ws; DEFINE_SBQ_WAIT(wait); unsigned int tag_offset; - bool drop_ctx; int tag; if (data->flags & BLK_MQ_REQ_RESERVED) { @@ -136,7 +135,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) return BLK_MQ_TAG_FAIL; ws = bt_wait_ptr(bt, data->hctx); - drop_ctx = data->ctx == NULL; do { struct sbitmap_queue *bt_prev; @@ -161,9 +159,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) if (tag != -1) break; - if (data->ctx) - blk_mq_put_ctx(data->ctx); - bt_prev = bt; io_schedule(); @@ -189,9 +184,6 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) ws = bt_wait_ptr(bt, data->hctx); } while (1); - if (drop_ctx && data->ctx) - blk_mq_put_ctx(data->ctx); - sbitmap_finish_wait(bt, ws, &wait); found_tag: diff --git a/block/blk-mq.c b/block/blk-mq.c index 35948d45d7b9..00d572265aa3 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -359,13 +359,13 @@ static struct request *blk_mq_get_request(struct request_queue *q, struct elevator_queue *e = q->elevator; struct request *rq; unsigned int tag; - bool put_ctx_on_error = false; + bool clear_ctx_on_error = false; blk_queue_enter_live(q); data->q = q; if (likely(!data->ctx)) { data->ctx = blk_mq_get_ctx(q); - put_ctx_on_error = true; + clear_ctx_on_error = true; } if (likely(!data->hctx)) data->hctx = blk_mq_map_queue(q, data->cmd_flags, @@ -391,10 +391,8 @@ static struct request *blk_mq_get_request(struct request_queue *q, tag = blk_mq_get_tag(data); if (tag == BLK_MQ_TAG_FAIL) { - if (put_ctx_on_error) { - blk_mq_put_ctx(data->ctx); + if (clear_ctx_on_error) data->ctx = NULL; - } blk_queue_exit(q); return NULL; } @@ -431,8 +429,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, if (!rq) return ERR_PTR(-EWOULDBLOCK); - blk_mq_put_ctx(alloc_data.ctx); - rq->__data_len = 0; rq->__sector = (sector_t) -1; rq->bio = rq->biotail = NULL; @@ -1975,7 +1971,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) plug = current->plug; if (unlikely(is_flush_fua)) { - blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); /* bypass scheduler for flush rq */ @@ -1989,7 +1984,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) unsigned int request_count = plug->rq_count; struct request *last = NULL; - blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); if (!request_count) @@ -2023,8 +2017,6 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) blk_add_rq_to_plug(plug, rq); trace_block_plug(q); - blk_mq_put_ctx(data.ctx); - if (same_queue_rq) { data.hctx = same_queue_rq->mq_hctx; trace_block_unplug(q, 1, true); @@ -2033,11 +2025,9 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio) } } else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator && !data.hctx->dispatch_busy)) { - blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); blk_mq_try_issue_directly(data.hctx, rq, &cookie); } else { - blk_mq_put_ctx(data.ctx); blk_mq_bio_to_request(rq, bio); blk_mq_sched_insert_request(rq, false, true, true); } diff --git a/block/blk-mq.h b/block/blk-mq.h index 633a5a77ee8b..f4bf5161333e 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -151,12 +151,7 @@ static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q, */ static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q) { - return __blk_mq_get_ctx(q, get_cpu()); -} - -static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx) -{ - put_cpu(); + return __blk_mq_get_ctx(q, raw_smp_processor_id()); } struct blk_mq_alloc_data { diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c index c3b05119cebd..71c7b07b645a 100644 --- a/block/kyber-iosched.c +++ b/block/kyber-iosched.c @@ -574,7 +574,6 @@ static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio) spin_lock(&kcq->lock); merged = blk_mq_bio_list_merge(hctx->queue, rq_list, bio); spin_unlock(&kcq->lock); - blk_mq_put_ctx(ctx); return merged; }
No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends on preemption being disabled for its correctness. Since removing the CPU preemption calls does not measurably affect performance, simplify the blk-mq code by removing the blk_mq_put_ctx() function and also by not disabling preemption in blk_mq_get_ctx(). Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- block/blk-mq-sched.c | 5 +---- block/blk-mq-tag.c | 8 -------- block/blk-mq.c | 16 +++------------- block/blk-mq.h | 7 +------ block/kyber-iosched.c | 1 - 5 files changed, 5 insertions(+), 32 deletions(-)