diff mbox

[0/4] blk-mq-sched: allow to use hw tag for sched

Message ID da795dd0-bba1-3a7a-5b24-696d51bcbc2e@kernel.dk (mailing list archive)
State New, archived
Headers show

Commit Message

Jens Axboe April 26, 2017, 6:22 p.m. UTC
On 04/26/2017 11:15 AM, Jens Axboe wrote:
> On 04/26/2017 03:48 AM, Ming Lei wrote:
>> Hi Christoph,
>>
>> On Thu, Apr 20, 2017 at 04:30:42PM +0800, Ming Lei wrote:
>>> Hi Christoph,
>>>
>>> On Wed, Apr 19, 2017 at 09:54:08PM -0700, Christoph Hellwig wrote:
>>>> On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
>>>>> If we don't need to reserve tag for internal command, I am happy
>>>>> to fix it in the interim. However, the mtip32xx driver has to
>>>>> ask blk-mq to reserve the tag zero for internal command. Then
>>>>> if we can't allow the driver to use that reserved tag, it looks
>>>>> quite awkward, doesn't it?
>>>>
>>>> It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
>>>>
>>>> But I'm pretty sure the hardware wouldn't require a specific tag
>>>> for the internal ops.
>>>
>>> From mtip32xx code, the tag 0 is used to send internal command only, and
>>> not used for submitting normal request.
>>>
>>> From code of mtip_issue_non_ncq_command() and mtip_issue_ncq_command(),
>>> even same hardware address is writen to when issuing non-ncq and ncq
>>> commands. So I think the internal ops and normal requests share one same
>>> tag space on mtip32xx.
>>>
>>> Could you explain it a bit why you think the hw doesn't need the tag
>>> for internal ops in this case?
>>
>> Gentle ping...
>>
>> Looks there are some choices for this issue:
>>
>> 1) if internal ops uses independent tag space
>> - we need to clean up the mtip32xx driver
>>
>> 2) if internal ops shares tag space with normal request
>> - export blk_mq_get_driver_tag() for mtip32xx
>>
>> or
>>
>> - use private way to send internal ops, and the drawback
>> is that we still need to reserve tag and the reserved tag
>> can't be used at all.
>>
>> From mtip32xx source code, I can see both share same tag space.
>> When I try to search hw manual via google, not succeed yet.
>>
>> Christoph, could you share us your findings about if internal ops
>> uses independent tag space or not?
> 
> Why aren't we just bypassing for RESERVED? If someone is asking
> for an reserved tag, ensure that we use the right tag pool (sched
> vs normal) and sub pool (reserved vs normal). Then insert just
> bypasses the scheduler, like it already does if we have a driver
> tag assigned already.
> 
> I've got a few of these cards, but traveling. I'll wire up something
> and test it end this week.

Something like this. Totally untested.

Comments

Ming Lei April 27, 2017, 3:14 a.m. UTC | #1
On Wed, Apr 26, 2017 at 11:22:43AM -0700, Jens Axboe wrote:
> On 04/26/2017 11:15 AM, Jens Axboe wrote:
> > On 04/26/2017 03:48 AM, Ming Lei wrote:
> >> Hi Christoph,
> >>
> >> On Thu, Apr 20, 2017 at 04:30:42PM +0800, Ming Lei wrote:
> >>> Hi Christoph,
> >>>
> >>> On Wed, Apr 19, 2017 at 09:54:08PM -0700, Christoph Hellwig wrote:
> >>>> On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
> >>>>> If we don't need to reserve tag for internal command, I am happy
> >>>>> to fix it in the interim. However, the mtip32xx driver has to
> >>>>> ask blk-mq to reserve the tag zero for internal command. Then
> >>>>> if we can't allow the driver to use that reserved tag, it looks
> >>>>> quite awkward, doesn't it?
> >>>>
> >>>> It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
> >>>>
> >>>> But I'm pretty sure the hardware wouldn't require a specific tag
> >>>> for the internal ops.
> >>>
> >>> From mtip32xx code, the tag 0 is used to send internal command only, and
> >>> not used for submitting normal request.
> >>>
> >>> From code of mtip_issue_non_ncq_command() and mtip_issue_ncq_command(),
> >>> even same hardware address is writen to when issuing non-ncq and ncq
> >>> commands. So I think the internal ops and normal requests share one same
> >>> tag space on mtip32xx.
> >>>
> >>> Could you explain it a bit why you think the hw doesn't need the tag
> >>> for internal ops in this case?
> >>
> >> Gentle ping...
> >>
> >> Looks there are some choices for this issue:
> >>
> >> 1) if internal ops uses independent tag space
> >> - we need to clean up the mtip32xx driver
> >>
> >> 2) if internal ops shares tag space with normal request
> >> - export blk_mq_get_driver_tag() for mtip32xx
> >>
> >> or
> >>
> >> - use private way to send internal ops, and the drawback
> >> is that we still need to reserve tag and the reserved tag
> >> can't be used at all.
> >>
> >> From mtip32xx source code, I can see both share same tag space.
> >> When I try to search hw manual via google, not succeed yet.
> >>
> >> Christoph, could you share us your findings about if internal ops
> >> uses independent tag space or not?
> > 
> > Why aren't we just bypassing for RESERVED? If someone is asking
> > for an reserved tag, ensure that we use the right tag pool (sched
> > vs normal) and sub pool (reserved vs normal). Then insert just
> > bypasses the scheduler, like it already does if we have a driver
> > tag assigned already.
> > 
> > I've got a few of these cards, but traveling. I'll wire up something
> > and test it end this week.
> 
> Something like this. Totally untested.
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index c974a1bbf4cb..11109b5b64b6 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -119,7 +119,11 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
>  	if (likely(!data->hctx))
>  		data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
>  
> -	if (e) {
> +	/*
> +	 * For a reserved tag, allocate a normal request since we might
> +	 * have driver dependencies on the value of the internal tag.
> +	 */
> +	if (e && !(data->flags & BLK_MQ_REQ_RESERVED)) {
>  		data->flags |= BLK_MQ_REQ_INTERNAL;
>  
>  		/*

This one looks reasonable, since reserved rqs often needn't scheduling,
also works well on mtip32xx with another patch[1] together.

Tested-by: Ming Lei <ming.lei@redhat.com>

[1] http://marc.info/?l=linux-block&m=149256194703481&w=2

Thanks,
Ming
Jens Axboe April 27, 2017, 1:49 p.m. UTC | #2
On 04/26/2017 08:14 PM, Ming Lei wrote:
> On Wed, Apr 26, 2017 at 11:22:43AM -0700, Jens Axboe wrote:
>> On 04/26/2017 11:15 AM, Jens Axboe wrote:
>>> On 04/26/2017 03:48 AM, Ming Lei wrote:
>>>> Hi Christoph,
>>>>
>>>> On Thu, Apr 20, 2017 at 04:30:42PM +0800, Ming Lei wrote:
>>>>> Hi Christoph,
>>>>>
>>>>> On Wed, Apr 19, 2017 at 09:54:08PM -0700, Christoph Hellwig wrote:
>>>>>> On Thu, Apr 20, 2017 at 09:03:47AM +0800, Ming Lei wrote:
>>>>>>> If we don't need to reserve tag for internal command, I am happy
>>>>>>> to fix it in the interim. However, the mtip32xx driver has to
>>>>>>> ask blk-mq to reserve the tag zero for internal command. Then
>>>>>>> if we can't allow the driver to use that reserved tag, it looks
>>>>>>> quite awkward, doesn't it?
>>>>>>
>>>>>> It doesn't.  Just offset the hardware tags by 1 from the blk-mq tags.
>>>>>>
>>>>>> But I'm pretty sure the hardware wouldn't require a specific tag
>>>>>> for the internal ops.
>>>>>
>>>>> From mtip32xx code, the tag 0 is used to send internal command only, and
>>>>> not used for submitting normal request.
>>>>>
>>>>> From code of mtip_issue_non_ncq_command() and mtip_issue_ncq_command(),
>>>>> even same hardware address is writen to when issuing non-ncq and ncq
>>>>> commands. So I think the internal ops and normal requests share one same
>>>>> tag space on mtip32xx.
>>>>>
>>>>> Could you explain it a bit why you think the hw doesn't need the tag
>>>>> for internal ops in this case?
>>>>
>>>> Gentle ping...
>>>>
>>>> Looks there are some choices for this issue:
>>>>
>>>> 1) if internal ops uses independent tag space
>>>> - we need to clean up the mtip32xx driver
>>>>
>>>> 2) if internal ops shares tag space with normal request
>>>> - export blk_mq_get_driver_tag() for mtip32xx
>>>>
>>>> or
>>>>
>>>> - use private way to send internal ops, and the drawback
>>>> is that we still need to reserve tag and the reserved tag
>>>> can't be used at all.
>>>>
>>>> From mtip32xx source code, I can see both share same tag space.
>>>> When I try to search hw manual via google, not succeed yet.
>>>>
>>>> Christoph, could you share us your findings about if internal ops
>>>> uses independent tag space or not?
>>>
>>> Why aren't we just bypassing for RESERVED? If someone is asking
>>> for an reserved tag, ensure that we use the right tag pool (sched
>>> vs normal) and sub pool (reserved vs normal). Then insert just
>>> bypasses the scheduler, like it already does if we have a driver
>>> tag assigned already.
>>>
>>> I've got a few of these cards, but traveling. I'll wire up something
>>> and test it end this week.
>>
>> Something like this. Totally untested.
>>
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index c974a1bbf4cb..11109b5b64b6 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -119,7 +119,11 @@ struct request *blk_mq_sched_get_request(struct request_queue *q,
>>  	if (likely(!data->hctx))
>>  		data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
>>  
>> -	if (e) {
>> +	/*
>> +	 * For a reserved tag, allocate a normal request since we might
>> +	 * have driver dependencies on the value of the internal tag.
>> +	 */
>> +	if (e && !(data->flags & BLK_MQ_REQ_RESERVED)) {
>>  		data->flags |= BLK_MQ_REQ_INTERNAL;
>>  
>>  		/*
> 
> This one looks reasonable, since reserved rqs often needn't scheduling,
> also works well on mtip32xx with another patch[1] together.
> 
> Tested-by: Ming Lei <ming.lei@redhat.com>

Thanks Ming, I've added that patch and your patch to initialize the cmd
at runtime. Queued up for 4.12. Once 4.12 opens and it gets merged with
master, I'll queue up a revert of:

commit 4981d04dd8f1ab19e2cce008da556d7f099b6e68
Author: Ming Lei <ming.lei@redhat.com>
Date:   Sat Apr 15 20:38:23 2017 +0800

    mtip32xx: pass BLK_MQ_F_NO_SCHED
Christoph Hellwig April 27, 2017, 3:20 p.m. UTC | #3
On Thu, Apr 27, 2017 at 06:49:43AM -0700, Jens Axboe wrote:
> Thanks Ming, I've added that patch and your patch to initialize the cmd
> at runtime.

I really think this is the wrong fix, and doing these sort of band-aids
just will lead to further problems done the road.
Jens Axboe April 27, 2017, 3:46 p.m. UTC | #4
On 04/27/2017 09:20 AM, Christoph Hellwig wrote:
> On Thu, Apr 27, 2017 at 06:49:43AM -0700, Jens Axboe wrote:
>> Thanks Ming, I've added that patch and your patch to initialize the cmd
>> at runtime.
> 
> I really think this is the wrong fix, and doing these sort of band-aids
> just will lead to further problems done the road.

Ming's patch is just fine, mine is a bit more of a hack. I'll throw some
cycles at fixing up mtip32xx to not have a hard wired internal tag since
it doesn't need to, and then we can drop my one-liner again.
diff mbox

Patch

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c974a1bbf4cb..11109b5b64b6 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -119,7 +119,11 @@  struct request *blk_mq_sched_get_request(struct request_queue *q,
 	if (likely(!data->hctx))
 		data->hctx = blk_mq_map_queue(q, data->ctx->cpu);
 
-	if (e) {
+	/*
+	 * For a reserved tag, allocate a normal request since we might
+	 * have driver dependencies on the value of the internal tag.
+	 */
+	if (e && !(data->flags & BLK_MQ_REQ_RESERVED)) {
 		data->flags |= BLK_MQ_REQ_INTERNAL;
 
 		/*