Message ID | 20211013164937.985367-3-axboe@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Various block optimizations | expand |
On Wed, Oct 13, 2021 at 10:49:35AM -0600, Jens Axboe wrote: > If we don't use an IO scheduler or have shared tags, then we don't need > to call into this external function at all. This saves ~2% for such > a setup. Hmm. What happens if you just throw an inline tag onto blk_mq_get_driver_tag? All the high performance callers should be in blk-mq.c anyway. If that isn't enough maybe something like the version below? diff --git a/block/blk-mq.c b/block/blk-mq.c index 38e6651d8b94c..ba9af26d5209d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1126,18 +1126,23 @@ static bool __blk_mq_get_driver_tag(struct request *rq) return true; } -bool blk_mq_get_driver_tag(struct request *rq) +static void blk_mq_inc_active_requests(struct request *rq) +{ + if (!(rq->rq_flags & RQF_MQ_INFLIGHT)) { + rq->rq_flags |= RQF_MQ_INFLIGHT; + __blk_mq_inc_active_requests(rq->mq_hctx); + } +} + +inline bool blk_mq_get_driver_tag(struct request *rq) { struct blk_mq_hw_ctx *hctx = rq->mq_hctx; if (rq->tag == BLK_MQ_NO_TAG && !__blk_mq_get_driver_tag(rq)) return false; - if ((hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) && - !(rq->rq_flags & RQF_MQ_INFLIGHT)) { - rq->rq_flags |= RQF_MQ_INFLIGHT; - __blk_mq_inc_active_requests(hctx); - } + if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) + blk_mq_inc_active_requests(rq); hctx->tags->rqs[rq->tag] = rq; return true; }
On 10/13/21 11:22 AM, Christoph Hellwig wrote: > On Wed, Oct 13, 2021 at 10:49:35AM -0600, Jens Axboe wrote: >> If we don't use an IO scheduler or have shared tags, then we don't need >> to call into this external function at all. This saves ~2% for such >> a setup. > > Hmm. What happens if you just throw an inline tag onto > blk_mq_get_driver_tag? I'd be surprised if that's any different than my patch in terms of performance, the fast path would be about the same. I don't feel strongly about it, can do that instead.
On Wed, Oct 13, 2021 at 11:46:04AM -0600, Jens Axboe wrote: > On 10/13/21 11:22 AM, Christoph Hellwig wrote: > > On Wed, Oct 13, 2021 at 10:49:35AM -0600, Jens Axboe wrote: > >> If we don't use an IO scheduler or have shared tags, then we don't need > >> to call into this external function at all. This saves ~2% for such > >> a setup. > > > > Hmm. What happens if you just throw an inline tag onto > > blk_mq_get_driver_tag? > > I'd be surprised if that's any different than my patch in terms of > performance, the fast path would be about the same. I don't feel > strongly about it, can do that instead. I find the double indirection in your patch a bit confusing. Not a big deal if it is actually required, but if we can avoid that I'd prefer not to add the extra indirection.
On 10/13/21 11:57 AM, Christoph Hellwig wrote: > On Wed, Oct 13, 2021 at 11:46:04AM -0600, Jens Axboe wrote: >> On 10/13/21 11:22 AM, Christoph Hellwig wrote: >>> On Wed, Oct 13, 2021 at 10:49:35AM -0600, Jens Axboe wrote: >>>> If we don't use an IO scheduler or have shared tags, then we don't need >>>> to call into this external function at all. This saves ~2% for such >>>> a setup. >>> >>> Hmm. What happens if you just throw an inline tag onto >>> blk_mq_get_driver_tag? >> >> I'd be surprised if that's any different than my patch in terms of >> performance, the fast path would be about the same. I don't feel >> strongly about it, can do that instead. > > I find the double indirection in your patch a bit confusing. Not a big > deal if it is actually required, but if we can avoid that I'd prefer > not to add the extra indirection. Tested the variants, and it does seem to be the best one...
diff --git a/block/blk-mq.c b/block/blk-mq.c index 46a91e5fabc5..fe3e926c20a9 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1135,7 +1135,7 @@ static inline unsigned int queued_to_index(unsigned int queued) return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1); } -static bool __blk_mq_get_driver_tag(struct request *rq) +static bool __blk_mq_alloc_driver_tag(struct request *rq) { struct sbitmap_queue *bt = &rq->mq_hctx->tags->bitmap_tags; unsigned int tag_offset = rq->mq_hctx->tags->nr_reserved_tags; @@ -1159,11 +1159,9 @@ static bool __blk_mq_get_driver_tag(struct request *rq) return true; } -bool blk_mq_get_driver_tag(struct request *rq) +bool __blk_mq_get_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq) { - struct blk_mq_hw_ctx *hctx = rq->mq_hctx; - - if (rq->tag == BLK_MQ_NO_TAG && !__blk_mq_get_driver_tag(rq)) + if (rq->tag == BLK_MQ_NO_TAG && !__blk_mq_alloc_driver_tag(rq)) return false; if ((hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) && diff --git a/block/blk-mq.h b/block/blk-mq.h index 8be447995106..ceed0a001c76 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -264,7 +264,20 @@ static inline void blk_mq_put_driver_tag(struct request *rq) __blk_mq_put_driver_tag(rq->mq_hctx, rq); } -bool blk_mq_get_driver_tag(struct request *rq); +bool __blk_mq_get_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq); + +static inline bool blk_mq_get_driver_tag(struct request *rq) +{ + struct blk_mq_hw_ctx *hctx = rq->mq_hctx; + + if (rq->tag != BLK_MQ_NO_TAG && + !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) { + hctx->tags->rqs[rq->tag] = rq; + return true; + } + + return __blk_mq_get_driver_tag(hctx, rq); +} static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap) {
If we don't use an IO scheduler or have shared tags, then we don't need to call into this external function at all. This saves ~2% for such a setup. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/blk-mq.c | 8 +++----- block/blk-mq.h | 15 ++++++++++++++- 2 files changed, 17 insertions(+), 6 deletions(-)