[10/14] blk-mq: initial support for multiple queue maps

Message ID	20181029163738.10172-11-axboe@kernel.dk (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Jens Axboe <axboe@kernel.dk> To: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Subject: [PATCH 10/14] blk-mq: initial support for multiple queue maps Date: Mon, 29 Oct 2018 10:37:34 -0600 Message-Id: <20181029163738.10172-11-axboe@kernel.dk> In-Reply-To: <20181029163738.10172-1-axboe@kernel.dk> References: <20181029163738.10172-1-axboe@kernel.dk> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	blk-mq: Add support for multiple queue maps \| expand [PATCHSET,v2,0/14] blk-mq: Add support for multiple queue maps [01/14] blk-mq: kill q->mq_map [02/14] blk-mq: abstract out queue map [03/14] blk-mq: provide dummy blk_mq_map_queue_type() helper [04/14] blk-mq: pass in request/bio flags to queue mapping [05/14] blk-mq: allow software queue to map to multiple hardware queues [06/14] blk-mq: add 'type' attribute to the sysfs hctx directory [07/14] blk-mq: support multiple hctx maps [08/14] blk-mq: separate number of hardware queues from nr_cpu_ids [09/14] blk-mq: ensure that plug lists don't straddle hardware queues [10/14] blk-mq: initial support for multiple queue maps [11/14] irq: add support for allocating (and affinitizing) sets of IRQs [12/14] nvme: utilize two queue maps, one for reads and one for writes [13/14] block: add REQ_HIPRI and inherit it from IOCB_HIPRI [14/14] nvme: add separate poll queue map

Jens Axboe Oct. 29, 2018, 4:37 p.m. UTC

Add a queue offset to the tag map. This enables users to map
iteratively, for each queue map type they support.

Bump maximum number of supported maps to 2, we're now fully
able to support more than 1 map.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-mq-cpumap.c  | 9 +++++----
 block/blk-mq-pci.c     | 2 +-
 block/blk-mq-virtio.c  | 2 +-
 include/linux/blk-mq.h | 3 ++-
 4 files changed, 9 insertions(+), 7 deletions(-)

Bart Van Assche Oct. 29, 2018, 7:40 p.m. UTC | #1

On Mon, 2018-10-29 at 10:37 -0600, Jens Axboe wrote:
> -static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> +static int cpu_to_queue_index(struct blk_mq_queue_map *qmap,
> +			      unsigned int nr_queues, const int cpu)
>  {
> -	return cpu % nr_queues;
> +	return qmap->queue_offset + (cpu % nr_queues);
>  }
> 
> [ ... ]
>  
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -78,10 +78,11 @@ struct blk_mq_hw_ctx {
>  struct blk_mq_queue_map {
>  	unsigned int *mq_map;
>  	unsigned int nr_queues;
> +	unsigned int queue_offset;
>  };

I think it's unfortunate that the blk-mq core uses the .queue_offset member but
that mapping functions in block drivers are responsible for setting that member.
Since the block driver mapping functions have to set blk_mq_queue_map.nr_queues,
how about adding a loop in blk_mq_update_queue_map() that derives .queue_offset
from .nr_queues from previous array entries?

Thanks,

Bart.

Jens Axboe Oct. 29, 2018, 7:53 p.m. UTC | #2

On 10/29/18 1:40 PM, Bart Van Assche wrote:
> On Mon, 2018-10-29 at 10:37 -0600, Jens Axboe wrote:
>> -static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>> +static int cpu_to_queue_index(struct blk_mq_queue_map *qmap,
>> +			      unsigned int nr_queues, const int cpu)
>>  {
>> -	return cpu % nr_queues;
>> +	return qmap->queue_offset + (cpu % nr_queues);
>>  }
>>
>> [ ... ]
>>  
>> --- a/include/linux/blk-mq.h
>> +++ b/include/linux/blk-mq.h
>> @@ -78,10 +78,11 @@ struct blk_mq_hw_ctx {
>>  struct blk_mq_queue_map {
>>  	unsigned int *mq_map;
>>  	unsigned int nr_queues;
>> +	unsigned int queue_offset;
>>  };
> 
> I think it's unfortunate that the blk-mq core uses the .queue_offset member but
> that mapping functions in block drivers are responsible for setting that member.
> Since the block driver mapping functions have to set blk_mq_queue_map.nr_queues,
> how about adding a loop in blk_mq_update_queue_map() that derives .queue_offset
> from .nr_queues from previous array entries?

It's not a simple increment, so the driver has to be the one setting it. If
we end up sharing queues, for instance, then the driver will need to set
it to the start offset of that set. If you go two patches forward you
can see that exact construct.

IOW, it's the driver that controls the offset, not the core.

Bart Van Assche Oct. 29, 2018, 8 p.m. UTC | #3

On Mon, 2018-10-29 at 13:53 -0600, Jens Axboe wrote:
> On 10/29/18 1:40 PM, Bart Van Assche wrote:
> > On Mon, 2018-10-29 at 10:37 -0600, Jens Axboe wrote:
> > > -static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
> > > +static int cpu_to_queue_index(struct blk_mq_queue_map *qmap,
> > > +			      unsigned int nr_queues, const int cpu)
> > >  {
> > > -	return cpu % nr_queues;
> > > +	return qmap->queue_offset + (cpu % nr_queues);
> > >  }
> > > 
> > > [ ... ]
> > >  
> > > --- a/include/linux/blk-mq.h
> > > +++ b/include/linux/blk-mq.h
> > > @@ -78,10 +78,11 @@ struct blk_mq_hw_ctx {
> > >  struct blk_mq_queue_map {
> > >  	unsigned int *mq_map;
> > >  	unsigned int nr_queues;
> > > +	unsigned int queue_offset;
> > >  };
> > 
> > I think it's unfortunate that the blk-mq core uses the .queue_offset member but
> > that mapping functions in block drivers are responsible for setting that member.
> > Since the block driver mapping functions have to set blk_mq_queue_map.nr_queues,
> > how about adding a loop in blk_mq_update_queue_map() that derives .queue_offset
> > from .nr_queues from previous array entries?
> 
> It's not a simple increment, so the driver has to be the one setting it. If
> we end up sharing queues, for instance, then the driver will need to set
> it to the start offset of that set. If you go two patches forward you
> can see that exact construct.
> 
> IOW, it's the driver that controls the offset, not the core.

If sharing of hardware queues between hardware queue types is supported,
what should hctx->type be set to? Additionally, patch 5 adds code that uses
hctx->type as an array index. How can that code work if a single hardware
queue can be shared by multiple hardware queue types?

Thanks,

Bart.

Jens Axboe Oct. 29, 2018, 8:09 p.m. UTC | #4

On 10/29/18 2:00 PM, Bart Van Assche wrote:
> On Mon, 2018-10-29 at 13:53 -0600, Jens Axboe wrote:
>> On 10/29/18 1:40 PM, Bart Van Assche wrote:
>>> On Mon, 2018-10-29 at 10:37 -0600, Jens Axboe wrote:
>>>> -static int cpu_to_queue_index(unsigned int nr_queues, const int cpu)
>>>> +static int cpu_to_queue_index(struct blk_mq_queue_map *qmap,
>>>> +			      unsigned int nr_queues, const int cpu)
>>>>  {
>>>> -	return cpu % nr_queues;
>>>> +	return qmap->queue_offset + (cpu % nr_queues);
>>>>  }
>>>>
>>>> [ ... ]
>>>>  
>>>> --- a/include/linux/blk-mq.h
>>>> +++ b/include/linux/blk-mq.h
>>>> @@ -78,10 +78,11 @@ struct blk_mq_hw_ctx {
>>>>  struct blk_mq_queue_map {
>>>>  	unsigned int *mq_map;
>>>>  	unsigned int nr_queues;
>>>> +	unsigned int queue_offset;
>>>>  };
>>>
>>> I think it's unfortunate that the blk-mq core uses the .queue_offset member but
>>> that mapping functions in block drivers are responsible for setting that member.
>>> Since the block driver mapping functions have to set blk_mq_queue_map.nr_queues,
>>> how about adding a loop in blk_mq_update_queue_map() that derives .queue_offset
>>> from .nr_queues from previous array entries?
>>
>> It's not a simple increment, so the driver has to be the one setting it. If
>> we end up sharing queues, for instance, then the driver will need to set
>> it to the start offset of that set. If you go two patches forward you
>> can see that exact construct.
>>
>> IOW, it's the driver that controls the offset, not the core.
> 
> If sharing of hardware queues between hardware queue types is supported,
> what should hctx->type be set to? Additionally, patch 5 adds code that uses
> hctx->type as an array index. How can that code work if a single hardware
> queue can be shared by multiple hardware queue types?

hctx->type will be set to the value of the first type. This is all driver
private, blk-mq could not care less what the value of the type means.

As to the other question, it works just fine since that is the queue
that is being accessed. There's no confusion there. I think you're
misunderstanding how it's seutp. To use nvme as the example, type 0
would be reads, 1 writes, and 2 pollable queues. If reads and writes
share the same set of hardware queues, then type 1 simply doesn't
exist in terms of ->flags_to_type() return value. This is purely
driven by the driver. That hook is the only decider of where something
will go. If we share hctx sets, we share the same hardware queue as
well. There is just the one set for that case.

Bart Van Assche Oct. 29, 2018, 8:25 p.m. UTC | #5

On Mon, 2018-10-29 at 14:09 -0600, Jens Axboe wrote:
> hctx->type will be set to the value of the first type. This is all driver
> private, blk-mq could not care less what the value of the type means.
> 
> As to the other question, it works just fine since that is the queue
> that is being accessed. There's no confusion there. I think you're
> misunderstanding how it's seutp. To use nvme as the example, type 0
> would be reads, 1 writes, and 2 pollable queues. If reads and writes
> share the same set of hardware queues, then type 1 simply doesn't
> exist in terms of ->flags_to_type() return value. This is purely
> driven by the driver. That hook is the only decider of where something
> will go. If we share hctx sets, we share the same hardware queue as
> well. There is just the one set for that case.

How about adding a comment in blk-mq.h that explains that hardware queues can
be shared among different hardware queue types? I think this is nontrivial and
deserves a comment.

Thanks,

Bart.

Jens Axboe Oct. 29, 2018, 8:29 p.m. UTC | #6

On 10/29/18 2:25 PM, Bart Van Assche wrote:
> On Mon, 2018-10-29 at 14:09 -0600, Jens Axboe wrote:
>> hctx->type will be set to the value of the first type. This is all driver
>> private, blk-mq could not care less what the value of the type means.
>>
>> As to the other question, it works just fine since that is the queue
>> that is being accessed. There's no confusion there. I think you're
>> misunderstanding how it's seutp. To use nvme as the example, type 0
>> would be reads, 1 writes, and 2 pollable queues. If reads and writes
>> share the same set of hardware queues, then type 1 simply doesn't
>> exist in terms of ->flags_to_type() return value. This is purely
>> driven by the driver. That hook is the only decider of where something
>> will go. If we share hctx sets, we share the same hardware queue as
>> well. There is just the one set for that case.
> 
> How about adding a comment in blk-mq.h that explains that hardware queues can
> be shared among different hardware queue types? I think this is nontrivial and
> deserves a comment.

Sure, I can do that. I guess a key concept that is confusing based on
your above question is that the sets don't have to be consecutive.
It's perfectly valid to have 0 and 2 be the available queues, and
nothing for 1. For example.

BTW, split up the incremental patch, find them here:

http://git.kernel.dk/cgit/linux-block/commit/?h=mq-maps&id=6890d88deecfd3723ce620d82f5fc80485f9caec

and

http://git.kernel.dk/cgit/linux-block/commit/?h=mq-maps&id=907725dff2f8cc6d1502a9123f930b8d3708bd02

[10/14] blk-mq: initial support for multiple queue maps

Commit Message

Comments

Patch