[1/2] scsi_host: add support for request batching

Message ID	20190530112811.3066-2-pbonzini@redhat.com (mailing list archive)
State	Accepted
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Paolo Bonzini <pbonzini@redhat.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: jejb@linux.ibm.com, martin.petersen@oracle.com, linux-scsi@vger.kernel.org, stefanha@redhat.com Subject: [PATCH 1/2] scsi_host: add support for request batching Date: Thu, 30 May 2019 13:28:10 +0200 Message-Id: <20190530112811.3066-2-pbonzini@redhat.com> In-Reply-To: <20190530112811.3066-1-pbonzini@redhat.com> References: <20190530112811.3066-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	scsi: add support for request batching \| expand [0/2] scsi: add support for request batching [1/2] scsi_host: add support for request batching [2/2] virtio_scsi: implement request batching

Paolo Bonzini May 30, 2019, 11:28 a.m. UTC

This allows a list of requests to be issued, with the LLD only writing
the hardware doorbell when necessary, after the last request was prepared.
This is more efficient if we have lists of requests to issue, particularly
on virtualized hardware, where writing the doorbell is more expensive than
on real hardware.

The use case for this is plugged IO, where blk-mq flushes a batch of
requests all at once.

The API is the same as for blk-mq, just with blk-mq concepts tweaked to
fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes
a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is
extracted from the hctx and passed as a parameter.

The only complication is that blk-mq uses different plugging heuristics
depending on whether commit_rqs is present or not.  So we have two
different sets of blk_mq_ops and pick one depending on whether the
scsi_host template uses commit_rqs or not.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 drivers/scsi/scsi_lib.c  | 37 ++++++++++++++++++++++++++++++++++---
 include/scsi/scsi_cmnd.h |  1 +
 include/scsi/scsi_host.h | 16 ++++++++++++++--
 3 files changed, 49 insertions(+), 5 deletions(-)

Bart Van Assche May 30, 2019, 3:36 p.m. UTC | #1

On 5/30/19 4:28 AM, Paolo Bonzini wrote:
> +static const struct blk_mq_ops scsi_mq_ops_no_commit = {
> +	.get_budget	= scsi_mq_get_budget,
> +	.put_budget	= scsi_mq_put_budget,
> +	.queue_rq	= scsi_queue_rq,
> +	.complete	= scsi_softirq_done,
> +	.timeout	= scsi_timeout,
> +#ifdef CONFIG_BLK_DEBUG_FS
> +	.show_rq	= scsi_show_rq,
> +#endif
> +	.init_request	= scsi_mq_init_request,
> +	.exit_request	= scsi_mq_exit_request,
> +	.initialize_rq_fn = scsi_initialize_rq,
> +	.busy		= scsi_mq_lld_busy,
> +	.map_queues	= scsi_map_queues,
> +};
> +
> +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +	struct request_queue *q = hctx->queue;
> +	struct scsi_device *sdev = q->queuedata;
> +	struct Scsi_Host *shost = sdev->host;
> +
> +	shost->hostt->commit_rqs(shost, hctx->queue_num);
> +}
> +
>   static const struct blk_mq_ops scsi_mq_ops = {
>   	.get_budget	= scsi_mq_get_budget,
>   	.put_budget	= scsi_mq_put_budget,
>   	.queue_rq	= scsi_queue_rq,
> +	.commit_rqs	= scsi_commit_rqs,
>   	.complete	= scsi_softirq_done,
>   	.timeout	= scsi_timeout,
>   #ifdef CONFIG_BLK_DEBUG_FS

Hi Paolo,

Have you considered to modify the block layer such that a single 
scsi_mq_ops structure can be used for all SCSI LLD types?

Thanks,

Bart.

Paolo Bonzini May 30, 2019, 3:54 p.m. UTC | #2

On 30/05/19 17:36, Bart Van Assche wrote:
> On 5/30/19 4:28 AM, Paolo Bonzini wrote:
>> +static const struct blk_mq_ops scsi_mq_ops_no_commit = {
>> +    .get_budget    = scsi_mq_get_budget,
>> +    .put_budget    = scsi_mq_put_budget,
>> +    .queue_rq    = scsi_queue_rq,
>> +    .complete    = scsi_softirq_done,
>> +    .timeout    = scsi_timeout,
>> +#ifdef CONFIG_BLK_DEBUG_FS
>> +    .show_rq    = scsi_show_rq,
>> +#endif
>> +    .init_request    = scsi_mq_init_request,
>> +    .exit_request    = scsi_mq_exit_request,
>> +    .initialize_rq_fn = scsi_initialize_rq,
>> +    .busy        = scsi_mq_lld_busy,
>> +    .map_queues    = scsi_map_queues,
>> +};
>> +
>> +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx)
>> +{
>> +    struct request_queue *q = hctx->queue;
>> +    struct scsi_device *sdev = q->queuedata;
>> +    struct Scsi_Host *shost = sdev->host;
>> +
>> +    shost->hostt->commit_rqs(shost, hctx->queue_num);
>> +}
>> +
>>   static const struct blk_mq_ops scsi_mq_ops = {
>>       .get_budget    = scsi_mq_get_budget,
>>       .put_budget    = scsi_mq_put_budget,
>>       .queue_rq    = scsi_queue_rq,
>> +    .commit_rqs    = scsi_commit_rqs,
>>       .complete    = scsi_softirq_done,
>>       .timeout    = scsi_timeout,
>>   #ifdef CONFIG_BLK_DEBUG_FS
> 
> Hi Paolo,
> 
> Have you considered to modify the block layer such that a single
> scsi_mq_ops structure can be used for all SCSI LLD types?

Yes, but I don't think it's possible to do it in a nice way.
Any adjustment we make to the block layer to fit the SCSI subsystem's
desires would make all other block drivers uglier, so I chose to confine
the ugliness here.

The root issue is that the SCSI subsystem is unique in how it sits on
top of the block layer; this is the famous "adapter" (or "midlayer",
though that is confusing when talking about SCSI) design that Linux
usually tries to avoid.

Paolo

Bart Van Assche May 30, 2019, 5:54 p.m. UTC | #3

On 5/30/19 8:54 AM, Paolo Bonzini wrote:
> On 30/05/19 17:36, Bart Van Assche wrote:
>> On 5/30/19 4:28 AM, Paolo Bonzini wrote:
>>> +static const struct blk_mq_ops scsi_mq_ops_no_commit = {
>>> +    .get_budget    = scsi_mq_get_budget,
>>> +    .put_budget    = scsi_mq_put_budget,
>>> +    .queue_rq    = scsi_queue_rq,
>>> +    .complete    = scsi_softirq_done,
>>> +    .timeout    = scsi_timeout,
>>> +#ifdef CONFIG_BLK_DEBUG_FS
>>> +    .show_rq    = scsi_show_rq,
>>> +#endif
>>> +    .init_request    = scsi_mq_init_request,
>>> +    .exit_request    = scsi_mq_exit_request,
>>> +    .initialize_rq_fn = scsi_initialize_rq,
>>> +    .busy        = scsi_mq_lld_busy,
>>> +    .map_queues    = scsi_map_queues,
>>> +};
>>> +
>>> +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx)
>>> +{
>>> +    struct request_queue *q = hctx->queue;
>>> +    struct scsi_device *sdev = q->queuedata;
>>> +    struct Scsi_Host *shost = sdev->host;
>>> +
>>> +    shost->hostt->commit_rqs(shost, hctx->queue_num);
>>> +}
>>> +
>>>   static const struct blk_mq_ops scsi_mq_ops = {
>>>       .get_budget    = scsi_mq_get_budget,
>>>       .put_budget    = scsi_mq_put_budget,
>>>       .queue_rq    = scsi_queue_rq,
>>> +    .commit_rqs    = scsi_commit_rqs,
>>>       .complete    = scsi_softirq_done,
>>>       .timeout    = scsi_timeout,
>>>   #ifdef CONFIG_BLK_DEBUG_FS
>>
>> Hi Paolo,
>>
>> Have you considered to modify the block layer such that a single
>> scsi_mq_ops structure can be used for all SCSI LLD types?
> 
> Yes, but I don't think it's possible to do it in a nice way.
> Any adjustment we make to the block layer to fit the SCSI subsystem's
> desires would make all other block drivers uglier, so I chose to confine
> the ugliness here.
> 
> The root issue is that the SCSI subsystem is unique in how it sits on
> top of the block layer; this is the famous "adapter" (or "midlayer",
> though that is confusing when talking about SCSI) design that Linux
> usually tries to avoid.

As far as I can see the only impact of defining an empty commit_rqs
callback on the queueing behavior is that blk_mq_make_request() will
queue requests for multiple hwqs on the plug list instead of requests
for a single hwq. The plug list is sorted by hwq before it is submitted
to a block driver. If that helps NVMe performance it should also help
SCSI performance. How about always setting commit_rqs = scsi_commit_rqs
in scsi_mq_ops?

Thanks,

Bart.

Ming Lei May 31, 2019, 3:27 a.m. UTC | #4

On Thu, May 30, 2019 at 7:28 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> This allows a list of requests to be issued, with the LLD only writing
> the hardware doorbell when necessary, after the last request was prepared.
> This is more efficient if we have lists of requests to issue, particularly
> on virtualized hardware, where writing the doorbell is more expensive than
> on real hardware.
>
> The use case for this is plugged IO, where blk-mq flushes a batch of
> requests all at once.
>
> The API is the same as for blk-mq, just with blk-mq concepts tweaked to
> fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes
> a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is
> extracted from the hctx and passed as a parameter.
>
> The only complication is that blk-mq uses different plugging heuristics
> depending on whether commit_rqs is present or not.  So we have two
> different sets of blk_mq_ops and pick one depending on whether the
> scsi_host template uses commit_rqs or not.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  drivers/scsi/scsi_lib.c  | 37 ++++++++++++++++++++++++++++++++++---
>  include/scsi/scsi_cmnd.h |  1 +
>  include/scsi/scsi_host.h | 16 ++++++++++++++--
>  3 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 601b9f1de267..eb4e67d02bfe 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1673,10 +1673,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>                 blk_mq_start_request(req);
>         }
>
> +       cmd->flags &= SCMD_PRESERVED_FLAGS;
>         if (sdev->simple_tags)
>                 cmd->flags |= SCMD_TAGGED;
> -       else
> -               cmd->flags &= ~SCMD_TAGGED;
> +       if (bd->last)
> +               cmd->flags |= SCMD_LAST;
>
>         scsi_init_cmd_errh(cmd);
>         cmd->scsi_done = scsi_mq_done;
> @@ -1807,10 +1808,37 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
>  }
>  EXPORT_SYMBOL_GPL(__scsi_init_queue);
>
> +static const struct blk_mq_ops scsi_mq_ops_no_commit = {
> +       .get_budget     = scsi_mq_get_budget,
> +       .put_budget     = scsi_mq_put_budget,
> +       .queue_rq       = scsi_queue_rq,
> +       .complete       = scsi_softirq_done,
> +       .timeout        = scsi_timeout,
> +#ifdef CONFIG_BLK_DEBUG_FS
> +       .show_rq        = scsi_show_rq,
> +#endif
> +       .init_request   = scsi_mq_init_request,
> +       .exit_request   = scsi_mq_exit_request,
> +       .initialize_rq_fn = scsi_initialize_rq,
> +       .busy           = scsi_mq_lld_busy,
> +       .map_queues     = scsi_map_queues,
> +};
> +
> +
> +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +       struct request_queue *q = hctx->queue;
> +       struct scsi_device *sdev = q->queuedata;
> +       struct Scsi_Host *shost = sdev->host;
> +
> +       shost->hostt->commit_rqs(shost, hctx->queue_num);
> +}

It should be fine to implement scsi_commit_rqs() as:

 if (shost->hostt->commit_rqs)
       shost->hostt->commit_rqs(shost, hctx->queue_num);

then scsi_mq_ops_no_commit can be saved.

Because .commit_rqs() is only called when BLK_STS_*_RESOURCE is
returned from scsi_queue_rq(), at that time shost->hostt->commit_rqs should
have been hit from cache given .queuecommand is called via
host->hostt->queuecommand.

Not mention BLK_STS_*_RESOURCE is just often returned for small queue depth
device.

Thanks,
Ming Lei

Paolo Bonzini May 31, 2019, 9:12 a.m. UTC | #5

On 30/05/19 19:54, Bart Van Assche wrote:
> As far as I can see the only impact of defining an empty commit_rqs
> callback on the queueing behavior is that blk_mq_make_request() will
> queue requests for multiple hwqs on the plug list instead of requests
> for a single hwq. The plug list is sorted by hwq before it is submitted
> to a block driver. If that helps NVMe performance it should also help
> SCSI performance.

See the comment in blk_mq_make_request(): sorting by hwq helps NVMe
because it uses bd->last, and blk_mq_make_request() uses the presence of
the ->commit_rqs() as a sign that the driver uses bd->last.  The
heuristic basically trades latency (with plugging, the driver rings the
doorbell a bit later) for throughput (ringing the doorbell is slow, so
we want to do it less).  If the driver doesn't use bd->last and the
doorbell is always rung, plugging adds to the latency without the
throughput benefit.

All that the duplicate blk_mq_ops do is letting blk_mq_make_request()
use the same heuristic for SCSI.  This should be beneficial exactly for
the reason that you mention: if the heuristic helps non-SCSI block
driver performance, it should also help performance of SCSI drivers that
have nr_hw_queues > 1 but no commit_rqs (lpfc, qedi, qla2xxx, smartpqi,
storvsc).

Paolo

> How about always setting commit_rqs = scsi_commit_rqs
> in scsi_mq_ops?

Paolo Bonzini June 3, 2019, 8:16 a.m. UTC | #6

On 31/05/19 05:27, Ming Lei wrote:
> It should be fine to implement scsi_commit_rqs() as:
> 
>  if (shost->hostt->commit_rqs)
>        shost->hostt->commit_rqs(shost, hctx->queue_num);
> 
> then scsi_mq_ops_no_commit can be saved.
> 
> Because .commit_rqs() is only called when BLK_STS_*_RESOURCE is
> returned from scsi_queue_rq(), at that time shost->hostt->commit_rqs should
> have been hit from cache given .queuecommand is called via
> host->hostt->queuecommand.

This is not about d-cache, it's about preserving the heuristics that
blk-mq applies depending on whether commit_rqs is there or not.

Paolo

Bart Van Assche June 4, 2019, 10:40 p.m. UTC | #7

On 5/30/19 4:28 AM, Paolo Bonzini wrote:
> This allows a list of requests to be issued, with the LLD only writing
> the hardware doorbell when necessary, after the last request was prepared.
> This is more efficient if we have lists of requests to issue, particularly
> on virtualized hardware, where writing the doorbell is more expensive than
> on real hardware.
> 
> The use case for this is plugged IO, where blk-mq flushes a batch of
> requests all at once.
> 
> The API is the same as for blk-mq, just with blk-mq concepts tweaked to
> fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes
> a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is
> extracted from the hctx and passed as a parameter.
> 
> The only complication is that blk-mq uses different plugging heuristics
> depending on whether commit_rqs is present or not.  So we have two
> different sets of blk_mq_ops and pick one depending on whether the
> scsi_host template uses commit_rqs or not.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

Hannes Reinecke June 19, 2019, 8:11 a.m. UTC | #8

On 5/30/19 1:28 PM, Paolo Bonzini wrote:
> This allows a list of requests to be issued, with the LLD only writing
> the hardware doorbell when necessary, after the last request was prepared.
> This is more efficient if we have lists of requests to issue, particularly
> on virtualized hardware, where writing the doorbell is more expensive than
> on real hardware.
> 
> The use case for this is plugged IO, where blk-mq flushes a batch of
> requests all at once.
> 
> The API is the same as for blk-mq, just with blk-mq concepts tweaked to
> fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes
> a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is
> extracted from the hctx and passed as a parameter.
> 
> The only complication is that blk-mq uses different plugging heuristics
> depending on whether commit_rqs is present or not.  So we have two
> different sets of blk_mq_ops and pick one depending on whether the
> scsi_host template uses commit_rqs or not.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  drivers/scsi/scsi_lib.c  | 37 ++++++++++++++++++++++++++++++++++---
>  include/scsi/scsi_cmnd.h |  1 +
>  include/scsi/scsi_host.h | 16 ++++++++++++++--
>  3 files changed, 49 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 601b9f1de267..eb4e67d02bfe 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1673,10 +1673,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>  		blk_mq_start_request(req);
>  	}
>  
> +	cmd->flags &= SCMD_PRESERVED_FLAGS;
>  	if (sdev->simple_tags)
>  		cmd->flags |= SCMD_TAGGED;
> -	else
> -		cmd->flags &= ~SCMD_TAGGED;
> +	if (bd->last)
> +		cmd->flags |= SCMD_LAST;
>  
>  	scsi_init_cmd_errh(cmd);
>  	cmd->scsi_done = scsi_mq_done;
> @@ -1807,10 +1808,37 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
>  }
>  EXPORT_SYMBOL_GPL(__scsi_init_queue);
>  
> +static const struct blk_mq_ops scsi_mq_ops_no_commit = {
> +	.get_budget	= scsi_mq_get_budget,
> +	.put_budget	= scsi_mq_put_budget,
> +	.queue_rq	= scsi_queue_rq,
> +	.complete	= scsi_softirq_done,
> +	.timeout	= scsi_timeout,
> +#ifdef CONFIG_BLK_DEBUG_FS
> +	.show_rq	= scsi_show_rq,
> +#endif
> +	.init_request	= scsi_mq_init_request,
> +	.exit_request	= scsi_mq_exit_request,
> +	.initialize_rq_fn = scsi_initialize_rq,
> +	.busy		= scsi_mq_lld_busy,
> +	.map_queues	= scsi_map_queues,
> +};
> +
> +
> +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx)
> +{
> +	struct request_queue *q = hctx->queue;
> +	struct scsi_device *sdev = q->queuedata;
> +	struct Scsi_Host *shost = sdev->host;
> +
> +	shost->hostt->commit_rqs(shost, hctx->queue_num);
> +}
> +
>  static const struct blk_mq_ops scsi_mq_ops = {
>  	.get_budget	= scsi_mq_get_budget,
>  	.put_budget	= scsi_mq_put_budget,
>  	.queue_rq	= scsi_queue_rq,
> +	.commit_rqs	= scsi_commit_rqs,
>  	.complete	= scsi_softirq_done,
>  	.timeout	= scsi_timeout,
>  #ifdef CONFIG_BLK_DEBUG_FS
> @@ -1845,7 +1873,10 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>  		cmd_size += sizeof(struct scsi_data_buffer) + sgl_size;
>  
>  	memset(&shost->tag_set, 0, sizeof(shost->tag_set));
> -	shost->tag_set.ops = &scsi_mq_ops;
> +	if (shost->hostt->commit_rqs)
> +		shost->tag_set.ops = &scsi_mq_ops;
> +	else
> +		shost->tag_set.ops = &scsi_mq_ops_no_commit;
>  	shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1;
>  	shost->tag_set.queue_depth = shost->can_queue;
>  	shost->tag_set.cmd_size = cmd_size;
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index 76ed5e4acd38..91bd749a02f7 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -57,6 +57,7 @@ struct scsi_pointer {
>  #define SCMD_TAGGED		(1 << 0)
>  #define SCMD_UNCHECKED_ISA_DMA	(1 << 1)
>  #define SCMD_INITIALIZED	(1 << 2)
> +#define SCMD_LAST		(1 << 3)
>  /* flags preserved across unprep / reprep */
>  #define SCMD_PRESERVED_FLAGS	(SCMD_UNCHECKED_ISA_DMA | SCMD_INITIALIZED)
>  
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 2b539a1b3f62..28f1c9177cd2 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -80,8 +80,10 @@ struct scsi_host_template {
>  	 * command block to the LLDD.  When the driver finished
>  	 * processing the command the done callback is invoked.
>  	 *
> -	 * If queuecommand returns 0, then the HBA has accepted the
> -	 * command.  The done() function must be called on the command
> +	 * If queuecommand returns 0, then the driver has accepted the
> +	 * command.  It must also push it to the HBA if the scsi_cmnd
> +	 * flag SCMD_LAST is set, or if the driver does not implement
> +	 * commit_rqs.  The done() function must be called on the command
>  	 * when the driver has finished with it. (you may call done on the
>  	 * command before queuecommand returns, but in this case you
>  	 * *must* return 0 from queuecommand).
> @@ -109,6 +111,16 @@ struct scsi_host_template {
>  	 */
>  	int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);
>  
> +	/*
> +	 * The commit_rqs function is used to trigger a hardware
> +	 * doorbell after some requests have been queued with
> +	 * queuecommand, when an error is encountered before sending
> +	 * the request with SCMD_LAST set.
> +	 *
> +	 * STATUS: OPTIONAL
> +	 */
> +	void (*commit_rqs)(struct Scsi_Host *, u16);
> +
>  	/*
>  	 * This is an error handling strategy routine.  You don't need to
>  	 * define one of these if you don't want to - there is a default
> 
I'm a bit unsure if 'bd->last' is always set; it's quite obvious that
it's present if set, but what about requests with 'bd->last == false' ?
Is there a guarantee that they will _always_ be followed with a request
with bd->last == true?
And if so, is there a guarantee that this request is part of the same batch?

Aside from it: I think it's a good idea to match the '->last' setting
onto the SCMD_LAST flag; I would even go so far and make this an
independent patch.

Once to above points are cleared, that is.

But if that one is in, why do we need to have the separate 'commit_rqs'
callback?
Can't we let the driver decide to issue a doorbell kick (or whatever the
driver decides to do there)?
If we ensure that the SCMD_LAST flag is always set for the end of a
batch (even if this batch consists only of one request), the driver
simply can evaluate the flag and do its actions.
Why do we need a new callback here?

Cheers,

Hannes

Paolo Bonzini June 19, 2019, 10:31 a.m. UTC | #9

On 19/06/19 10:11, Hannes Reinecke wrote:
> On 5/30/19 1:28 PM, Paolo Bonzini wrote:
>> This allows a list of requests to be issued, with the LLD only writing
>> the hardware doorbell when necessary, after the last request was prepared.
>> This is more efficient if we have lists of requests to issue, particularly
>> on virtualized hardware, where writing the doorbell is more expensive than
>> on real hardware.
>>
>> The use case for this is plugged IO, where blk-mq flushes a batch of
>> requests all at once.
>>
>> The API is the same as for blk-mq, just with blk-mq concepts tweaked to
>> fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes
>> a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is
>> extracted from the hctx and passed as a parameter.
>>
>> The only complication is that blk-mq uses different plugging heuristics
>> depending on whether commit_rqs is present or not.  So we have two
>> different sets of blk_mq_ops and pick one depending on whether the
>> scsi_host template uses commit_rqs or not.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  drivers/scsi/scsi_lib.c  | 37 ++++++++++++++++++++++++++++++++++---
>>  include/scsi/scsi_cmnd.h |  1 +
>>  include/scsi/scsi_host.h | 16 ++++++++++++++--
>>  3 files changed, 49 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 601b9f1de267..eb4e67d02bfe 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1673,10 +1673,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  		blk_mq_start_request(req);
>>  	}
>>  
>> +	cmd->flags &= SCMD_PRESERVED_FLAGS;
>>  	if (sdev->simple_tags)
>>  		cmd->flags |= SCMD_TAGGED;
>> -	else
>> -		cmd->flags &= ~SCMD_TAGGED;
>> +	if (bd->last)
>> +		cmd->flags |= SCMD_LAST;
>>  
>>  	scsi_init_cmd_errh(cmd);
>>  	cmd->scsi_done = scsi_mq_done;
>> @@ -1807,10 +1808,37 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
>>  }
>>  EXPORT_SYMBOL_GPL(__scsi_init_queue);
>>  
>> +static const struct blk_mq_ops scsi_mq_ops_no_commit = {
>> +	.get_budget	= scsi_mq_get_budget,
>> +	.put_budget	= scsi_mq_put_budget,
>> +	.queue_rq	= scsi_queue_rq,
>> +	.complete	= scsi_softirq_done,
>> +	.timeout	= scsi_timeout,
>> +#ifdef CONFIG_BLK_DEBUG_FS
>> +	.show_rq	= scsi_show_rq,
>> +#endif
>> +	.init_request	= scsi_mq_init_request,
>> +	.exit_request	= scsi_mq_exit_request,
>> +	.initialize_rq_fn = scsi_initialize_rq,
>> +	.busy		= scsi_mq_lld_busy,
>> +	.map_queues	= scsi_map_queues,
>> +};
>> +
>> +
>> +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx)
>> +{
>> +	struct request_queue *q = hctx->queue;
>> +	struct scsi_device *sdev = q->queuedata;
>> +	struct Scsi_Host *shost = sdev->host;
>> +
>> +	shost->hostt->commit_rqs(shost, hctx->queue_num);
>> +}
>> +
>>  static const struct blk_mq_ops scsi_mq_ops = {
>>  	.get_budget	= scsi_mq_get_budget,
>>  	.put_budget	= scsi_mq_put_budget,
>>  	.queue_rq	= scsi_queue_rq,
>> +	.commit_rqs	= scsi_commit_rqs,
>>  	.complete	= scsi_softirq_done,
>>  	.timeout	= scsi_timeout,
>>  #ifdef CONFIG_BLK_DEBUG_FS
>> @@ -1845,7 +1873,10 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>>  		cmd_size += sizeof(struct scsi_data_buffer) + sgl_size;
>>  
>>  	memset(&shost->tag_set, 0, sizeof(shost->tag_set));
>> -	shost->tag_set.ops = &scsi_mq_ops;
>> +	if (shost->hostt->commit_rqs)
>> +		shost->tag_set.ops = &scsi_mq_ops;
>> +	else
>> +		shost->tag_set.ops = &scsi_mq_ops_no_commit;
>>  	shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1;
>>  	shost->tag_set.queue_depth = shost->can_queue;
>>  	shost->tag_set.cmd_size = cmd_size;
>> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
>> index 76ed5e4acd38..91bd749a02f7 100644
>> --- a/include/scsi/scsi_cmnd.h
>> +++ b/include/scsi/scsi_cmnd.h
>> @@ -57,6 +57,7 @@ struct scsi_pointer {
>>  #define SCMD_TAGGED		(1 << 0)
>>  #define SCMD_UNCHECKED_ISA_DMA	(1 << 1)
>>  #define SCMD_INITIALIZED	(1 << 2)
>> +#define SCMD_LAST		(1 << 3)
>>  /* flags preserved across unprep / reprep */
>>  #define SCMD_PRESERVED_FLAGS	(SCMD_UNCHECKED_ISA_DMA | SCMD_INITIALIZED)
>>  
>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>> index 2b539a1b3f62..28f1c9177cd2 100644
>> --- a/include/scsi/scsi_host.h
>> +++ b/include/scsi/scsi_host.h
>> @@ -80,8 +80,10 @@ struct scsi_host_template {
>>  	 * command block to the LLDD.  When the driver finished
>>  	 * processing the command the done callback is invoked.
>>  	 *
>> -	 * If queuecommand returns 0, then the HBA has accepted the
>> -	 * command.  The done() function must be called on the command
>> +	 * If queuecommand returns 0, then the driver has accepted the
>> +	 * command.  It must also push it to the HBA if the scsi_cmnd
>> +	 * flag SCMD_LAST is set, or if the driver does not implement
>> +	 * commit_rqs.  The done() function must be called on the command
>>  	 * when the driver has finished with it. (you may call done on the
>>  	 * command before queuecommand returns, but in this case you
>>  	 * *must* return 0 from queuecommand).
>> @@ -109,6 +111,16 @@ struct scsi_host_template {
>>  	 */
>>  	int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);
>>  
>> +	/*
>> +	 * The commit_rqs function is used to trigger a hardware
>> +	 * doorbell after some requests have been queued with
>> +	 * queuecommand, when an error is encountered before sending
>> +	 * the request with SCMD_LAST set.
>> +	 *
>> +	 * STATUS: OPTIONAL
>> +	 */
>> +	void (*commit_rqs)(struct Scsi_Host *, u16);
>> +
>>  	/*
>>  	 * This is an error handling strategy routine.  You don't need to
>>  	 * define one of these if you don't want to - there is a default
>>
> I'm a bit unsure if 'bd->last' is always set; it's quite obvious that
> it's present if set, but what about requests with 'bd->last == false' ?
> Is there a guarantee that they will _always_ be followed with a request
> with bd->last == true?
> And if so, is there a guarantee that this request is part of the same batch?

It's complicated.  A request with bd->last == false _will_ always be
followed by a request with bd->last == true in the same batch.  However,
due to e.g. errors it may be possible that the last request is not sent.
 In that case, the block layer sends commit_rqs, as documented in the
comment above, to flush the requests that have been sent already.

So, a driver that obeys bd->last (or SCMD_LAST) but does not implement
commit_rqs is bound to have bugs, which is why this patch was not split
further.

Makes sense?

Paolo

> Aside from it: I think it's a good idea to match the '->last' setting
> onto the SCMD_LAST flag; I would even go so far and make this an
> independent patch.

> Once to above points are cleared, that is.
> 
> But if that one is in, why do we need to have the separate 'commit_rqs'
> callback?
> Can't we let the driver decide to issue a doorbell kick (or whatever the
> driver decides to do there)?
> If we ensure that the SCMD_LAST flag is always set for the end of a
> batch (even if this batch consists only of one request), the driver
> simply can evaluate the flag and do its actions.
> Why do we need a new callback here?
> 
> Cheers,
> 
> Hannes
>

Paolo Bonzini July 4, 2019, 1:19 p.m. UTC | #10

On 19/06/19 12:31, Paolo Bonzini wrote:
>> I'm a bit unsure if 'bd->last' is always set; it's quite obvious that
>> it's present if set, but what about requests with 'bd->last == false' ?
>> Is there a guarantee that they will _always_ be followed with a request
>> with bd->last == true?
>> And if so, is there a guarantee that this request is part of the same batch?
> It's complicated.  A request with bd->last == false _will_ always be
> followed by a request with bd->last == true in the same batch.  However,
> due to e.g. errors it may be possible that the last request is not sent.
>  In that case, the block layer sends commit_rqs, as documented in the
> comment above, to flush the requests that have been sent already.
> 
> So, a driver that obeys bd->last (or SCMD_LAST) but does not implement
> commit_rqs is bound to have bugs, which is why this patch was not split
> further.
> 
> Makes sense?

Hannes, can you provide your Reviewed-by?

Thanks,

Paolo

Ming Lei July 5, 2019, 1 a.m. UTC | #11

On Mon, Jun 3, 2019 at 4:16 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 31/05/19 05:27, Ming Lei wrote:
> > It should be fine to implement scsi_commit_rqs() as:
> >
> >  if (shost->hostt->commit_rqs)
> >        shost->hostt->commit_rqs(shost, hctx->queue_num);
> >
> > then scsi_mq_ops_no_commit can be saved.
> >
> > Because .commit_rqs() is only called when BLK_STS_*_RESOURCE is
> > returned from scsi_queue_rq(), at that time shost->hostt->commit_rqs should
> > have been hit from cache given .queuecommand is called via
> > host->hostt->queuecommand.
>
> This is not about d-cache, it's about preserving the heuristics that
> blk-mq applies depending on whether commit_rqs is there or not.

Fair enough, at least difference would be made by the check in
blk_mq_make_request() if scsi_commit_rqs is provided unconditionally,
so looks fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming Lei

Hannes Reinecke July 5, 2019, 7:12 a.m. UTC | #12

On 7/4/19 3:19 PM, Paolo Bonzini wrote:
> On 19/06/19 12:31, Paolo Bonzini wrote:
>>> I'm a bit unsure if 'bd->last' is always set; it's quite obvious that
>>> it's present if set, but what about requests with 'bd->last == false' ?
>>> Is there a guarantee that they will _always_ be followed with a request
>>> with bd->last == true?
>>> And if so, is there a guarantee that this request is part of the same batch?
>> It's complicated.  A request with bd->last == false _will_ always be
>> followed by a request with bd->last == true in the same batch.  However,
>> due to e.g. errors it may be possible that the last request is not sent.
>>  In that case, the block layer sends commit_rqs, as documented in the
>> comment above, to flush the requests that have been sent already.
>>
>> So, a driver that obeys bd->last (or SCMD_LAST) but does not implement
>> commit_rqs is bound to have bugs, which is why this patch was not split
>> further.
>>
>> Makes sense?
> 
> Hannes, can you provide your Reviewed-by?
> 
Well ... since you asked for it:

Where is the 'commit_rqs' callback actually used?
I seem to be going blind, but I can't find it; should be somewhere in
the first patch, no?
As per description:

 * The commit_rqs function is used to trigger a hardware
 * doorbell after some requests have been queued with
 * queuecommand, when an error is encountered before sending
 * the request with SCMD_LAST set.

So it should be somewhere in the error path, probably scsi_error or
something. But I don't seem to be able to find it ...

Cheers,

Hannes

Stefan Hajnoczi July 5, 2019, 7:44 a.m. UTC | #13

On Fri, Jul 05, 2019 at 09:12:37AM +0200, Hannes Reinecke wrote:
> On 7/4/19 3:19 PM, Paolo Bonzini wrote:
> > On 19/06/19 12:31, Paolo Bonzini wrote:
> >>> I'm a bit unsure if 'bd->last' is always set; it's quite obvious that
> >>> it's present if set, but what about requests with 'bd->last == false' ?
> >>> Is there a guarantee that they will _always_ be followed with a request
> >>> with bd->last == true?
> >>> And if so, is there a guarantee that this request is part of the same batch?
> >> It's complicated.  A request with bd->last == false _will_ always be
> >> followed by a request with bd->last == true in the same batch.  However,
> >> due to e.g. errors it may be possible that the last request is not sent.
> >>  In that case, the block layer sends commit_rqs, as documented in the
> >> comment above, to flush the requests that have been sent already.
> >>
> >> So, a driver that obeys bd->last (or SCMD_LAST) but does not implement
> >> commit_rqs is bound to have bugs, which is why this patch was not split
> >> further.
> >>
> >> Makes sense?
> > 
> > Hannes, can you provide your Reviewed-by?
> > 
> Well ... since you asked for it:
> 
> Where is the 'commit_rqs' callback actually used?
> I seem to be going blind, but I can't find it; should be somewhere in
> the first patch, no?
> As per description:
> 
>  * The commit_rqs function is used to trigger a hardware
>  * doorbell after some requests have been queued with
>  * queuecommand, when an error is encountered before sending
>  * the request with SCMD_LAST set.
> 
> So it should be somewhere in the error path, probably scsi_error or
> something. But I don't seem to be able to find it ...

The block layer calls scsi_mq_ops->commit_rqs() from
blk_mq_dispatch_rq_list() and blk_mq_try_issue_list_directly().

Stefan

Hannes Reinecke July 5, 2019, 7:51 a.m. UTC | #14

On 7/5/19 9:44 AM, Stefan Hajnoczi wrote:
> On Fri, Jul 05, 2019 at 09:12:37AM +0200, Hannes Reinecke wrote:
>> On 7/4/19 3:19 PM, Paolo Bonzini wrote:
>>> On 19/06/19 12:31, Paolo Bonzini wrote:
>>>>> I'm a bit unsure if 'bd->last' is always set; it's quite obvious that
>>>>> it's present if set, but what about requests with 'bd->last == false' ?
>>>>> Is there a guarantee that they will _always_ be followed with a request
>>>>> with bd->last == true?
>>>>> And if so, is there a guarantee that this request is part of the same batch?
>>>> It's complicated.  A request with bd->last == false _will_ always be
>>>> followed by a request with bd->last == true in the same batch.  However,
>>>> due to e.g. errors it may be possible that the last request is not sent.
>>>>  In that case, the block layer sends commit_rqs, as documented in the
>>>> comment above, to flush the requests that have been sent already.
>>>>
>>>> So, a driver that obeys bd->last (or SCMD_LAST) but does not implement
>>>> commit_rqs is bound to have bugs, which is why this patch was not split
>>>> further.
>>>>
>>>> Makes sense?
>>>
>>> Hannes, can you provide your Reviewed-by?
>>>
>> Well ... since you asked for it:
>>
>> Where is the 'commit_rqs' callback actually used?
>> I seem to be going blind, but I can't find it; should be somewhere in
>> the first patch, no?
>> As per description:
>>
>>  * The commit_rqs function is used to trigger a hardware
>>  * doorbell after some requests have been queued with
>>  * queuecommand, when an error is encountered before sending
>>  * the request with SCMD_LAST set.
>>
>> So it should be somewhere in the error path, probably scsi_error or
>> something. But I don't seem to be able to find it ...
> 
> The block layer calls scsi_mq_ops->commit_rqs() from
> blk_mq_dispatch_rq_list() and blk_mq_try_issue_list_directly().
> 
Ah, right.

Now, then:

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes

[1/2] scsi_host: add support for request batching

Commit Message

Comments

Patch