Message ID | 20170428151539.25514-4-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 2017-04-28 at 23:15 +0800, Ming Lei wrote: > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > +{ > + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) > + return false; > + > + if (blk_mq_get_queue_depth(q) < q->nr_requests) > + return false; > + > + return true; > +} The only user of shared tag sets I know of is scsi-mq. I think it's really unfortunate that this patch systematically disables BLK_MQ_F_SCHED_USE_HW_TAG for scsi-mq. > int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr) > { > struct blk_mq_tag_set *set = q->tag_set; > @@ -2681,9 +2694,12 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr) > break; > } > > - if (!ret) > + if (!ret) { > q->nr_requests = nr; > > + blk_mq_update_sched_flag(q); > + } > + > blk_mq_unfreeze_queue(q); > blk_mq_start_stopped_hw_queues(q, true); If a queue is created with a low value of nr_requests that will cause blk_mq_sched_alloc_tags() to skip allocation of .sched_tags. If nr_requests is increased, can that cause this function to clear BLK_MQ_F_SCHED_USE_HW_TAG while keeping hctx->sched_tags == NULL and hence trigger a NULL pointer dereference? Bart.
On 04/28/2017 09:15 AM, Ming Lei wrote: > +/* > + * If this queue has enough hardware tags and doesn't share tags with > + * other queues, just use hw tag directly for scheduling. > + */ > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > +{ > + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) > + return false; > + > + if (blk_mq_get_queue_depth(q) < q->nr_requests) > + return false; I think we should leave a bigger gap. Ideally, for scheduling, we should have a hw queue depth that's around half of what the scheduler has to work with. That will always leave us something to schedule with, if the hw can't deplete the whole pool.
On Fri, 2017-04-28 at 12:22 -0600, Jens Axboe wrote: > On 04/28/2017 09:15 AM, Ming Lei wrote: > > +/* > > + * If this queue has enough hardware tags and doesn't share tags with > > + * other queues, just use hw tag directly for scheduling. > > + */ > > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > > +{ > > + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) > > + return false; > > + > > + if (blk_mq_get_queue_depth(q) < q->nr_requests) > > + return false; > > I think we should leave a bigger gap. Ideally, for scheduling, we should > have a hw queue depth that's around half of what the scheduler has to > work with. That will always leave us something to schedule with, if the > hw can't deplete the whole pool. Hello Jens, The scsi-mq core allocates exactly the same number of tags per hardware queue as the SCSI queue depth. Requiring that there is a gap would cause BLK_MQ_F_SCHED_USE_HW_TAG not to be enabled for any scsi-mq LLD. I'm not sure that changing the tag allocation strategy in scsi-mq would be the best solution. How about changing blk_mq_sched_may_use_hw_tag() into something like the below to guarantee that the scheduler has sufficient tags available? static bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) { return blk_mq_get_queue_depth(q) >= max(q->nr_requests, 16); } Thanks, Bart.
On Fri, Apr 28, 2017 at 06:09:40PM +0000, Bart Van Assche wrote: > On Fri, 2017-04-28 at 23:15 +0800, Ming Lei wrote: > > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > > +{ > > + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) > > + return false; > > + > > + if (blk_mq_get_queue_depth(q) < q->nr_requests) > > + return false; > > + > > + return true; > > +} > > The only user of shared tag sets I know of is scsi-mq. I think it's really > unfortunate that this patch systematically disables BLK_MQ_F_SCHED_USE_HW_TAG > for scsi-mq. In previous patch, I actually allow driver to pass this flag, but this feature is dropped in this post, just for making it simple & clean. If you think we need it for shared tag set, I can add it in v1. For shared tag sets, I suggest to not enable it at default, because scheduler is per request queue now, and generaly more requests available, better it performs. When tags are shared among several request queues, one of them may use tags up for its own scheduling, then starve others. But it should be possible and not difficult to allocate requests fairly for scheduling in this case if we switch to per-hctx scheduling. > > > int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr) > > { > > struct blk_mq_tag_set *set = q->tag_set; > > @@ -2681,9 +2694,12 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr) > > break; > > } > > > > - if (!ret) > > + if (!ret) { > > q->nr_requests = nr; > > > > + blk_mq_update_sched_flag(q); > > + } > > + > > blk_mq_unfreeze_queue(q); > > blk_mq_start_stopped_hw_queues(q, true); > > If a queue is created with a low value of nr_requests that will cause > blk_mq_sched_alloc_tags() to skip allocation of .sched_tags. If nr_requests > is increased, can that cause this function to clear BLK_MQ_F_SCHED_USE_HW_TAG > while keeping hctx->sched_tags == NULL and hence trigger a NULL pointer > dereference? Good catch, will fix it in V1. Given both scheduler switch and updating requests number are not frequent actions, we can allocate/deallocate .sched_tags just in need. Thanks, Ming
On Fri, Apr 28, 2017 at 12:22:45PM -0600, Jens Axboe wrote: > On 04/28/2017 09:15 AM, Ming Lei wrote: > > +/* > > + * If this queue has enough hardware tags and doesn't share tags with > > + * other queues, just use hw tag directly for scheduling. > > + */ > > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > > +{ > > + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) > > + return false; > > + > > + if (blk_mq_get_queue_depth(q) < q->nr_requests) > > + return false; > > I think we should leave a bigger gap. Ideally, for scheduling, we should > have a hw queue depth that's around half of what the scheduler has to > work with. That will always leave us something to schedule with, if the > hw can't deplete the whole pool. When .sched_tags and .tags are different, it makes sense to make nr_requests to be two times of queue_depth. When we switch to schedule with hw tags directly, the euquation can't be true at all. The simple policy in this patch can't be worsen than standalone .sched_tags because lifetime of one sched tag is actually same with request(from its allocation<io submission> to freeing<io completion>). When we have bigger queue depth, even we can schedule more. Thanks, Ming
On Sat, 2017-04-29 at 18:35 +0800, Ming Lei wrote: > On Fri, Apr 28, 2017 at 06:09:40PM +0000, Bart Van Assche wrote: > > On Fri, 2017-04-28 at 23:15 +0800, Ming Lei wrote: > > > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > > > +{ > > > + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) > > > + return false; > > > + > > > + if (blk_mq_get_queue_depth(q) < q->nr_requests) > > > + return false; > > > + > > > + return true; > > > +} > > > > The only user of shared tag sets I know of is scsi-mq. I think it's really > > unfortunate that this patch systematically disables BLK_MQ_F_SCHED_USE_HW_TAG > > for scsi-mq. > > In previous patch, I actually allow driver to pass this flag, but this > feature is dropped in this post, just for making it simple & clean. > If you think we need it for shared tag set, I can add it in v1. > > For shared tag sets, I suggest to not enable it at default, because > scheduler is per request queue now, and generaly more requests available, > better it performs. When tags are shared among several request > queues, one of them may use tags up for its own scheduling, then > starve others. But it should be possible and not difficult to allocate > requests fairly for scheduling in this case if we switch to per-hctx > scheduling. Hello Ming, Have you noticed that there is already a mechanism in the block layer to avoid starvation if a tag set is shared? The hctx_may_queue() function guarantees that each user that shares a tag set gets at least some tags. The .active_queues counter keeps track of the number of hardware queues that share a tag set. Bart.
On Mon, May 01, 2017 at 03:06:16PM +0000, Bart Van Assche wrote: > On Sat, 2017-04-29 at 18:35 +0800, Ming Lei wrote: > > On Fri, Apr 28, 2017 at 06:09:40PM +0000, Bart Van Assche wrote: > > > On Fri, 2017-04-28 at 23:15 +0800, Ming Lei wrote: > > > > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > > > > +{ > > > > + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) > > > > + return false; > > > > + > > > > + if (blk_mq_get_queue_depth(q) < q->nr_requests) > > > > + return false; > > > > + > > > > + return true; > > > > +} > > > > > > The only user of shared tag sets I know of is scsi-mq. I think it's really > > > unfortunate that this patch systematically disables BLK_MQ_F_SCHED_USE_HW_TAG > > > for scsi-mq. > > > > In previous patch, I actually allow driver to pass this flag, but this > > feature is dropped in this post, just for making it simple & clean. > > If you think we need it for shared tag set, I can add it in v1. > > > > For shared tag sets, I suggest to not enable it at default, because > > scheduler is per request queue now, and generaly more requests available, > > better it performs. When tags are shared among several request > > queues, one of them may use tags up for its own scheduling, then > > starve others. But it should be possible and not difficult to allocate > > requests fairly for scheduling in this case if we switch to per-hctx > > scheduling. > > Hello Ming, > > Have you noticed that there is already a mechanism in the block layer to > avoid starvation if a tag set is shared? The hctx_may_queue() function > guarantees that each user that shares a tag set gets at least some tags. > The .active_queues counter keeps track of the number of hardware queues > that share a tag set. > > Bart. The scheduler tags are there to abstract away the hardware, and USE_HW_TAG should just be an optimization for when that abstraction is a noop. That's not the case when there are shared tags, and I doubt that the overhead of the scheduler tags is significant for scsi-mq. Let's stick with the behavior Ming had here.
On Mon, May 01, 2017 at 03:06:16PM +0000, Bart Van Assche wrote: > On Sat, 2017-04-29 at 18:35 +0800, Ming Lei wrote: > > On Fri, Apr 28, 2017 at 06:09:40PM +0000, Bart Van Assche wrote: > > > On Fri, 2017-04-28 at 23:15 +0800, Ming Lei wrote: > > > > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > > > > +{ > > > > + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) > > > > + return false; > > > > + > > > > + if (blk_mq_get_queue_depth(q) < q->nr_requests) > > > > + return false; > > > > + > > > > + return true; > > > > +} > > > > > > The only user of shared tag sets I know of is scsi-mq. I think it's really > > > unfortunate that this patch systematically disables BLK_MQ_F_SCHED_USE_HW_TAG > > > for scsi-mq. > > > > In previous patch, I actually allow driver to pass this flag, but this > > feature is dropped in this post, just for making it simple & clean. > > If you think we need it for shared tag set, I can add it in v1. > > > > For shared tag sets, I suggest to not enable it at default, because > > scheduler is per request queue now, and generaly more requests available, > > better it performs. When tags are shared among several request > > queues, one of them may use tags up for its own scheduling, then > > starve others. But it should be possible and not difficult to allocate > > requests fairly for scheduling in this case if we switch to per-hctx > > scheduling. > > Hello Ming, > > Have you noticed that there is already a mechanism in the block layer to > avoid starvation if a tag set is shared? The hctx_may_queue() function > guarantees that each user that shares a tag set gets at least some tags. > The .active_queues counter keeps track of the number of hardware queues > that share a tag set. OK, from hctx_may_queue(), each hw queue can be allocated at most tags of (queue depth / nr_hw_queues), and we can allow to use hw tag for scheduler too if the following condition is ture for shared tags: q->nr_requests <= (blk_mq_get_queue_depth(q) / nr_hw_queues) It should often be true for some scsi devices(such as virtio-scsi) in some configurations. thanks, Ming
On Fri, Apr 28, 2017 at 11:15:38PM +0800, Ming Lei wrote: > When tag space of one device is big enough, we use hw tag > directly for I/O scheduling. > > Now the decision is made if hw queue depth is not less than > q->nr_requests and the tag set isn't shared. > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq-sched.c | 8 ++++++++ > block/blk-mq-sched.h | 15 +++++++++++++++ > block/blk-mq.c | 18 +++++++++++++++++- > 3 files changed, 40 insertions(+), 1 deletion(-) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 45a675f07b8b..4681e27c127e 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -507,6 +507,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) > struct elevator_queue *eq; > unsigned int i; > int ret; > + bool auto_hw_tag; > > if (!e) { > q->elevator = NULL; > @@ -519,7 +520,14 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) > */ > q->nr_requests = 2 * BLKDEV_MAX_RQ; > > + auto_hw_tag = blk_mq_sched_may_use_hw_tag(q); > + > queue_for_each_hw_ctx(q, hctx, i) { > + if (auto_hw_tag) > + hctx->flags |= BLK_MQ_F_SCHED_USE_HW_TAG; > + else > + hctx->flags &= ~BLK_MQ_F_SCHED_USE_HW_TAG; > + > ret = blk_mq_sched_alloc_tags(q, hctx, i); > if (ret) > goto err; I think you should also clear the BLK_MQ_F_SCHED_USE_HW_TAG flag in blk_mq_exit_sched()?
On Wed, May 03, 2017 at 09:29:36AM -0700, Omar Sandoval wrote: > On Fri, Apr 28, 2017 at 11:15:38PM +0800, Ming Lei wrote: > > When tag space of one device is big enough, we use hw tag > > directly for I/O scheduling. > > > > Now the decision is made if hw queue depth is not less than > > q->nr_requests and the tag set isn't shared. > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-mq-sched.c | 8 ++++++++ > > block/blk-mq-sched.h | 15 +++++++++++++++ > > block/blk-mq.c | 18 +++++++++++++++++- > > 3 files changed, 40 insertions(+), 1 deletion(-) > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > index 45a675f07b8b..4681e27c127e 100644 > > --- a/block/blk-mq-sched.c > > +++ b/block/blk-mq-sched.c > > @@ -507,6 +507,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) > > struct elevator_queue *eq; > > unsigned int i; > > int ret; > > + bool auto_hw_tag; > > > > if (!e) { > > q->elevator = NULL; > > @@ -519,7 +520,14 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) > > */ > > q->nr_requests = 2 * BLKDEV_MAX_RQ; > > > > + auto_hw_tag = blk_mq_sched_may_use_hw_tag(q); > > + > > queue_for_each_hw_ctx(q, hctx, i) { > > + if (auto_hw_tag) > > + hctx->flags |= BLK_MQ_F_SCHED_USE_HW_TAG; > > + else > > + hctx->flags &= ~BLK_MQ_F_SCHED_USE_HW_TAG; > > + > > ret = blk_mq_sched_alloc_tags(q, hctx, i); > > if (ret) > > goto err; > > I think you should also clear the BLK_MQ_F_SCHED_USE_HW_TAG flag in > blk_mq_exit_sched()? Looks not necessary since the flag is always evaluated in blk_mq_init_sched(). Thanks, Ming
On Thu, May 04, 2017 at 12:55:30AM +0800, Ming Lei wrote: > On Wed, May 03, 2017 at 09:29:36AM -0700, Omar Sandoval wrote: > > On Fri, Apr 28, 2017 at 11:15:38PM +0800, Ming Lei wrote: > > > When tag space of one device is big enough, we use hw tag > > > directly for I/O scheduling. > > > > > > Now the decision is made if hw queue depth is not less than > > > q->nr_requests and the tag set isn't shared. > > > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > --- > > > block/blk-mq-sched.c | 8 ++++++++ > > > block/blk-mq-sched.h | 15 +++++++++++++++ > > > block/blk-mq.c | 18 +++++++++++++++++- > > > 3 files changed, 40 insertions(+), 1 deletion(-) > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > > index 45a675f07b8b..4681e27c127e 100644 > > > --- a/block/blk-mq-sched.c > > > +++ b/block/blk-mq-sched.c > > > @@ -507,6 +507,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) > > > struct elevator_queue *eq; > > > unsigned int i; > > > int ret; > > > + bool auto_hw_tag; > > > > > > if (!e) { > > > q->elevator = NULL; > > > @@ -519,7 +520,14 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) > > > */ > > > q->nr_requests = 2 * BLKDEV_MAX_RQ; > > > > > > + auto_hw_tag = blk_mq_sched_may_use_hw_tag(q); > > > + > > > queue_for_each_hw_ctx(q, hctx, i) { > > > + if (auto_hw_tag) > > > + hctx->flags |= BLK_MQ_F_SCHED_USE_HW_TAG; > > > + else > > > + hctx->flags &= ~BLK_MQ_F_SCHED_USE_HW_TAG; > > > + > > > ret = blk_mq_sched_alloc_tags(q, hctx, i); > > > if (ret) > > > goto err; > > > > I think you should also clear the BLK_MQ_F_SCHED_USE_HW_TAG flag in > > blk_mq_exit_sched()? > > Looks not necessary since the flag is always evaluated in > blk_mq_init_sched(). What if we're setting the scheduler to "none"? Then blk_mq_init_sched() will go in here: if (!e) { q->elevator = NULL; return 0; }
On Wed, May 03, 2017 at 10:00:04AM -0700, Omar Sandoval wrote: > On Thu, May 04, 2017 at 12:55:30AM +0800, Ming Lei wrote: > > On Wed, May 03, 2017 at 09:29:36AM -0700, Omar Sandoval wrote: > > > On Fri, Apr 28, 2017 at 11:15:38PM +0800, Ming Lei wrote: > > > > When tag space of one device is big enough, we use hw tag > > > > directly for I/O scheduling. > > > > > > > > Now the decision is made if hw queue depth is not less than > > > > q->nr_requests and the tag set isn't shared. > > > > > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > > --- > > > > block/blk-mq-sched.c | 8 ++++++++ > > > > block/blk-mq-sched.h | 15 +++++++++++++++ > > > > block/blk-mq.c | 18 +++++++++++++++++- > > > > 3 files changed, 40 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > > > > index 45a675f07b8b..4681e27c127e 100644 > > > > --- a/block/blk-mq-sched.c > > > > +++ b/block/blk-mq-sched.c > > > > @@ -507,6 +507,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) > > > > struct elevator_queue *eq; > > > > unsigned int i; > > > > int ret; > > > > + bool auto_hw_tag; > > > > > > > > if (!e) { > > > > q->elevator = NULL; > > > > @@ -519,7 +520,14 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) > > > > */ > > > > q->nr_requests = 2 * BLKDEV_MAX_RQ; > > > > > > > > + auto_hw_tag = blk_mq_sched_may_use_hw_tag(q); > > > > + > > > > queue_for_each_hw_ctx(q, hctx, i) { > > > > + if (auto_hw_tag) > > > > + hctx->flags |= BLK_MQ_F_SCHED_USE_HW_TAG; > > > > + else > > > > + hctx->flags &= ~BLK_MQ_F_SCHED_USE_HW_TAG; > > > > + > > > > ret = blk_mq_sched_alloc_tags(q, hctx, i); > > > > if (ret) > > > > goto err; > > > > > > I think you should also clear the BLK_MQ_F_SCHED_USE_HW_TAG flag in > > > blk_mq_exit_sched()? > > > > Looks not necessary since the flag is always evaluated in > > blk_mq_init_sched(). > > What if we're setting the scheduler to "none"? Then blk_mq_init_sched() > will go in here: > > if (!e) { > q->elevator = NULL; > return 0; > } Good catch, will clear the flag in blk_mq_exit_sched() in V2. Thanks, Ming
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 45a675f07b8b..4681e27c127e 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -507,6 +507,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) struct elevator_queue *eq; unsigned int i; int ret; + bool auto_hw_tag; if (!e) { q->elevator = NULL; @@ -519,7 +520,14 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e) */ q->nr_requests = 2 * BLKDEV_MAX_RQ; + auto_hw_tag = blk_mq_sched_may_use_hw_tag(q); + queue_for_each_hw_ctx(q, hctx, i) { + if (auto_hw_tag) + hctx->flags |= BLK_MQ_F_SCHED_USE_HW_TAG; + else + hctx->flags &= ~BLK_MQ_F_SCHED_USE_HW_TAG; + ret = blk_mq_sched_alloc_tags(q, hctx, i); if (ret) goto err; diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index edafb5383b7b..22a19c118044 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -129,4 +129,19 @@ static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx) return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); } +/* + * If this queue has enough hardware tags and doesn't share tags with + * other queues, just use hw tag directly for scheduling. + */ +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) +{ + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) + return false; + + if (blk_mq_get_queue_depth(q) < q->nr_requests) + return false; + + return true; +} + #endif diff --git a/block/blk-mq.c b/block/blk-mq.c index 04761fb76ab4..b0bd1fb4b0f8 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2649,6 +2649,19 @@ int blk_mq_get_queue_depth(struct request_queue *q) return tags->bitmap_tags.sb.depth; } +static void blk_mq_update_sched_flag(struct request_queue *q) +{ + struct blk_mq_hw_ctx *hctx; + int i; + + if (!blk_mq_sched_may_use_hw_tag(q)) + queue_for_each_hw_ctx(q, hctx, i) + hctx->flags &= ~BLK_MQ_F_SCHED_USE_HW_TAG; + else + queue_for_each_hw_ctx(q, hctx, i) + hctx->flags |= BLK_MQ_F_SCHED_USE_HW_TAG; +} + int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr) { struct blk_mq_tag_set *set = q->tag_set; @@ -2681,9 +2694,12 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr) break; } - if (!ret) + if (!ret) { q->nr_requests = nr; + blk_mq_update_sched_flag(q); + } + blk_mq_unfreeze_queue(q); blk_mq_start_stopped_hw_queues(q, true);
When tag space of one device is big enough, we use hw tag directly for I/O scheduling. Now the decision is made if hw queue depth is not less than q->nr_requests and the tag set isn't shared. Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq-sched.c | 8 ++++++++ block/blk-mq-sched.h | 15 +++++++++++++++ block/blk-mq.c | 18 +++++++++++++++++- 3 files changed, 40 insertions(+), 1 deletion(-)