Message ID | 20190530112811.3066-2-pbonzini@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | scsi: add support for request batching | expand |
On 5/30/19 4:28 AM, Paolo Bonzini wrote: > +static const struct blk_mq_ops scsi_mq_ops_no_commit = { > + .get_budget = scsi_mq_get_budget, > + .put_budget = scsi_mq_put_budget, > + .queue_rq = scsi_queue_rq, > + .complete = scsi_softirq_done, > + .timeout = scsi_timeout, > +#ifdef CONFIG_BLK_DEBUG_FS > + .show_rq = scsi_show_rq, > +#endif > + .init_request = scsi_mq_init_request, > + .exit_request = scsi_mq_exit_request, > + .initialize_rq_fn = scsi_initialize_rq, > + .busy = scsi_mq_lld_busy, > + .map_queues = scsi_map_queues, > +}; > + > +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx) > +{ > + struct request_queue *q = hctx->queue; > + struct scsi_device *sdev = q->queuedata; > + struct Scsi_Host *shost = sdev->host; > + > + shost->hostt->commit_rqs(shost, hctx->queue_num); > +} > + > static const struct blk_mq_ops scsi_mq_ops = { > .get_budget = scsi_mq_get_budget, > .put_budget = scsi_mq_put_budget, > .queue_rq = scsi_queue_rq, > + .commit_rqs = scsi_commit_rqs, > .complete = scsi_softirq_done, > .timeout = scsi_timeout, > #ifdef CONFIG_BLK_DEBUG_FS Hi Paolo, Have you considered to modify the block layer such that a single scsi_mq_ops structure can be used for all SCSI LLD types? Thanks, Bart.
On 30/05/19 17:36, Bart Van Assche wrote: > On 5/30/19 4:28 AM, Paolo Bonzini wrote: >> +static const struct blk_mq_ops scsi_mq_ops_no_commit = { >> + .get_budget = scsi_mq_get_budget, >> + .put_budget = scsi_mq_put_budget, >> + .queue_rq = scsi_queue_rq, >> + .complete = scsi_softirq_done, >> + .timeout = scsi_timeout, >> +#ifdef CONFIG_BLK_DEBUG_FS >> + .show_rq = scsi_show_rq, >> +#endif >> + .init_request = scsi_mq_init_request, >> + .exit_request = scsi_mq_exit_request, >> + .initialize_rq_fn = scsi_initialize_rq, >> + .busy = scsi_mq_lld_busy, >> + .map_queues = scsi_map_queues, >> +}; >> + >> +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx) >> +{ >> + struct request_queue *q = hctx->queue; >> + struct scsi_device *sdev = q->queuedata; >> + struct Scsi_Host *shost = sdev->host; >> + >> + shost->hostt->commit_rqs(shost, hctx->queue_num); >> +} >> + >> static const struct blk_mq_ops scsi_mq_ops = { >> .get_budget = scsi_mq_get_budget, >> .put_budget = scsi_mq_put_budget, >> .queue_rq = scsi_queue_rq, >> + .commit_rqs = scsi_commit_rqs, >> .complete = scsi_softirq_done, >> .timeout = scsi_timeout, >> #ifdef CONFIG_BLK_DEBUG_FS > > Hi Paolo, > > Have you considered to modify the block layer such that a single > scsi_mq_ops structure can be used for all SCSI LLD types? Yes, but I don't think it's possible to do it in a nice way. Any adjustment we make to the block layer to fit the SCSI subsystem's desires would make all other block drivers uglier, so I chose to confine the ugliness here. The root issue is that the SCSI subsystem is unique in how it sits on top of the block layer; this is the famous "adapter" (or "midlayer", though that is confusing when talking about SCSI) design that Linux usually tries to avoid. Paolo
On 5/30/19 8:54 AM, Paolo Bonzini wrote: > On 30/05/19 17:36, Bart Van Assche wrote: >> On 5/30/19 4:28 AM, Paolo Bonzini wrote: >>> +static const struct blk_mq_ops scsi_mq_ops_no_commit = { >>> + .get_budget = scsi_mq_get_budget, >>> + .put_budget = scsi_mq_put_budget, >>> + .queue_rq = scsi_queue_rq, >>> + .complete = scsi_softirq_done, >>> + .timeout = scsi_timeout, >>> +#ifdef CONFIG_BLK_DEBUG_FS >>> + .show_rq = scsi_show_rq, >>> +#endif >>> + .init_request = scsi_mq_init_request, >>> + .exit_request = scsi_mq_exit_request, >>> + .initialize_rq_fn = scsi_initialize_rq, >>> + .busy = scsi_mq_lld_busy, >>> + .map_queues = scsi_map_queues, >>> +}; >>> + >>> +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx) >>> +{ >>> + struct request_queue *q = hctx->queue; >>> + struct scsi_device *sdev = q->queuedata; >>> + struct Scsi_Host *shost = sdev->host; >>> + >>> + shost->hostt->commit_rqs(shost, hctx->queue_num); >>> +} >>> + >>> static const struct blk_mq_ops scsi_mq_ops = { >>> .get_budget = scsi_mq_get_budget, >>> .put_budget = scsi_mq_put_budget, >>> .queue_rq = scsi_queue_rq, >>> + .commit_rqs = scsi_commit_rqs, >>> .complete = scsi_softirq_done, >>> .timeout = scsi_timeout, >>> #ifdef CONFIG_BLK_DEBUG_FS >> >> Hi Paolo, >> >> Have you considered to modify the block layer such that a single >> scsi_mq_ops structure can be used for all SCSI LLD types? > > Yes, but I don't think it's possible to do it in a nice way. > Any adjustment we make to the block layer to fit the SCSI subsystem's > desires would make all other block drivers uglier, so I chose to confine > the ugliness here. > > The root issue is that the SCSI subsystem is unique in how it sits on > top of the block layer; this is the famous "adapter" (or "midlayer", > though that is confusing when talking about SCSI) design that Linux > usually tries to avoid. As far as I can see the only impact of defining an empty commit_rqs callback on the queueing behavior is that blk_mq_make_request() will queue requests for multiple hwqs on the plug list instead of requests for a single hwq. The plug list is sorted by hwq before it is submitted to a block driver. If that helps NVMe performance it should also help SCSI performance. How about always setting commit_rqs = scsi_commit_rqs in scsi_mq_ops? Thanks, Bart.
On Thu, May 30, 2019 at 7:28 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > This allows a list of requests to be issued, with the LLD only writing > the hardware doorbell when necessary, after the last request was prepared. > This is more efficient if we have lists of requests to issue, particularly > on virtualized hardware, where writing the doorbell is more expensive than > on real hardware. > > The use case for this is plugged IO, where blk-mq flushes a batch of > requests all at once. > > The API is the same as for blk-mq, just with blk-mq concepts tweaked to > fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes > a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is > extracted from the hctx and passed as a parameter. > > The only complication is that blk-mq uses different plugging heuristics > depending on whether commit_rqs is present or not. So we have two > different sets of blk_mq_ops and pick one depending on whether the > scsi_host template uses commit_rqs or not. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > drivers/scsi/scsi_lib.c | 37 ++++++++++++++++++++++++++++++++++--- > include/scsi/scsi_cmnd.h | 1 + > include/scsi/scsi_host.h | 16 ++++++++++++++-- > 3 files changed, 49 insertions(+), 5 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 601b9f1de267..eb4e67d02bfe 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1673,10 +1673,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, > blk_mq_start_request(req); > } > > + cmd->flags &= SCMD_PRESERVED_FLAGS; > if (sdev->simple_tags) > cmd->flags |= SCMD_TAGGED; > - else > - cmd->flags &= ~SCMD_TAGGED; > + if (bd->last) > + cmd->flags |= SCMD_LAST; > > scsi_init_cmd_errh(cmd); > cmd->scsi_done = scsi_mq_done; > @@ -1807,10 +1808,37 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q) > } > EXPORT_SYMBOL_GPL(__scsi_init_queue); > > +static const struct blk_mq_ops scsi_mq_ops_no_commit = { > + .get_budget = scsi_mq_get_budget, > + .put_budget = scsi_mq_put_budget, > + .queue_rq = scsi_queue_rq, > + .complete = scsi_softirq_done, > + .timeout = scsi_timeout, > +#ifdef CONFIG_BLK_DEBUG_FS > + .show_rq = scsi_show_rq, > +#endif > + .init_request = scsi_mq_init_request, > + .exit_request = scsi_mq_exit_request, > + .initialize_rq_fn = scsi_initialize_rq, > + .busy = scsi_mq_lld_busy, > + .map_queues = scsi_map_queues, > +}; > + > + > +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx) > +{ > + struct request_queue *q = hctx->queue; > + struct scsi_device *sdev = q->queuedata; > + struct Scsi_Host *shost = sdev->host; > + > + shost->hostt->commit_rqs(shost, hctx->queue_num); > +} It should be fine to implement scsi_commit_rqs() as: if (shost->hostt->commit_rqs) shost->hostt->commit_rqs(shost, hctx->queue_num); then scsi_mq_ops_no_commit can be saved. Because .commit_rqs() is only called when BLK_STS_*_RESOURCE is returned from scsi_queue_rq(), at that time shost->hostt->commit_rqs should have been hit from cache given .queuecommand is called via host->hostt->queuecommand. Not mention BLK_STS_*_RESOURCE is just often returned for small queue depth device. Thanks, Ming Lei
On 30/05/19 19:54, Bart Van Assche wrote: > As far as I can see the only impact of defining an empty commit_rqs > callback on the queueing behavior is that blk_mq_make_request() will > queue requests for multiple hwqs on the plug list instead of requests > for a single hwq. The plug list is sorted by hwq before it is submitted > to a block driver. If that helps NVMe performance it should also help > SCSI performance. See the comment in blk_mq_make_request(): sorting by hwq helps NVMe because it uses bd->last, and blk_mq_make_request() uses the presence of the ->commit_rqs() as a sign that the driver uses bd->last. The heuristic basically trades latency (with plugging, the driver rings the doorbell a bit later) for throughput (ringing the doorbell is slow, so we want to do it less). If the driver doesn't use bd->last and the doorbell is always rung, plugging adds to the latency without the throughput benefit. All that the duplicate blk_mq_ops do is letting blk_mq_make_request() use the same heuristic for SCSI. This should be beneficial exactly for the reason that you mention: if the heuristic helps non-SCSI block driver performance, it should also help performance of SCSI drivers that have nr_hw_queues > 1 but no commit_rqs (lpfc, qedi, qla2xxx, smartpqi, storvsc). Paolo > How about always setting commit_rqs = scsi_commit_rqs > in scsi_mq_ops?
On 31/05/19 05:27, Ming Lei wrote: > It should be fine to implement scsi_commit_rqs() as: > > if (shost->hostt->commit_rqs) > shost->hostt->commit_rqs(shost, hctx->queue_num); > > then scsi_mq_ops_no_commit can be saved. > > Because .commit_rqs() is only called when BLK_STS_*_RESOURCE is > returned from scsi_queue_rq(), at that time shost->hostt->commit_rqs should > have been hit from cache given .queuecommand is called via > host->hostt->queuecommand. This is not about d-cache, it's about preserving the heuristics that blk-mq applies depending on whether commit_rqs is there or not. Paolo
On 5/30/19 4:28 AM, Paolo Bonzini wrote: > This allows a list of requests to be issued, with the LLD only writing > the hardware doorbell when necessary, after the last request was prepared. > This is more efficient if we have lists of requests to issue, particularly > on virtualized hardware, where writing the doorbell is more expensive than > on real hardware. > > The use case for this is plugged IO, where blk-mq flushes a batch of > requests all at once. > > The API is the same as for blk-mq, just with blk-mq concepts tweaked to > fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes > a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is > extracted from the hctx and passed as a parameter. > > The only complication is that blk-mq uses different plugging heuristics > depending on whether commit_rqs is present or not. So we have two > different sets of blk_mq_ops and pick one depending on whether the > scsi_host template uses commit_rqs or not. Reviewed-by: Bart Van Assche <bvanassche@acm.org>
On 5/30/19 1:28 PM, Paolo Bonzini wrote: > This allows a list of requests to be issued, with the LLD only writing > the hardware doorbell when necessary, after the last request was prepared. > This is more efficient if we have lists of requests to issue, particularly > on virtualized hardware, where writing the doorbell is more expensive than > on real hardware. > > The use case for this is plugged IO, where blk-mq flushes a batch of > requests all at once. > > The API is the same as for blk-mq, just with blk-mq concepts tweaked to > fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes > a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is > extracted from the hctx and passed as a parameter. > > The only complication is that blk-mq uses different plugging heuristics > depending on whether commit_rqs is present or not. So we have two > different sets of blk_mq_ops and pick one depending on whether the > scsi_host template uses commit_rqs or not. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > drivers/scsi/scsi_lib.c | 37 ++++++++++++++++++++++++++++++++++--- > include/scsi/scsi_cmnd.h | 1 + > include/scsi/scsi_host.h | 16 ++++++++++++++-- > 3 files changed, 49 insertions(+), 5 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 601b9f1de267..eb4e67d02bfe 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1673,10 +1673,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, > blk_mq_start_request(req); > } > > + cmd->flags &= SCMD_PRESERVED_FLAGS; > if (sdev->simple_tags) > cmd->flags |= SCMD_TAGGED; > - else > - cmd->flags &= ~SCMD_TAGGED; > + if (bd->last) > + cmd->flags |= SCMD_LAST; > > scsi_init_cmd_errh(cmd); > cmd->scsi_done = scsi_mq_done; > @@ -1807,10 +1808,37 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q) > } > EXPORT_SYMBOL_GPL(__scsi_init_queue); > > +static const struct blk_mq_ops scsi_mq_ops_no_commit = { > + .get_budget = scsi_mq_get_budget, > + .put_budget = scsi_mq_put_budget, > + .queue_rq = scsi_queue_rq, > + .complete = scsi_softirq_done, > + .timeout = scsi_timeout, > +#ifdef CONFIG_BLK_DEBUG_FS > + .show_rq = scsi_show_rq, > +#endif > + .init_request = scsi_mq_init_request, > + .exit_request = scsi_mq_exit_request, > + .initialize_rq_fn = scsi_initialize_rq, > + .busy = scsi_mq_lld_busy, > + .map_queues = scsi_map_queues, > +}; > + > + > +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx) > +{ > + struct request_queue *q = hctx->queue; > + struct scsi_device *sdev = q->queuedata; > + struct Scsi_Host *shost = sdev->host; > + > + shost->hostt->commit_rqs(shost, hctx->queue_num); > +} > + > static const struct blk_mq_ops scsi_mq_ops = { > .get_budget = scsi_mq_get_budget, > .put_budget = scsi_mq_put_budget, > .queue_rq = scsi_queue_rq, > + .commit_rqs = scsi_commit_rqs, > .complete = scsi_softirq_done, > .timeout = scsi_timeout, > #ifdef CONFIG_BLK_DEBUG_FS > @@ -1845,7 +1873,10 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) > cmd_size += sizeof(struct scsi_data_buffer) + sgl_size; > > memset(&shost->tag_set, 0, sizeof(shost->tag_set)); > - shost->tag_set.ops = &scsi_mq_ops; > + if (shost->hostt->commit_rqs) > + shost->tag_set.ops = &scsi_mq_ops; > + else > + shost->tag_set.ops = &scsi_mq_ops_no_commit; > shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1; > shost->tag_set.queue_depth = shost->can_queue; > shost->tag_set.cmd_size = cmd_size; > diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h > index 76ed5e4acd38..91bd749a02f7 100644 > --- a/include/scsi/scsi_cmnd.h > +++ b/include/scsi/scsi_cmnd.h > @@ -57,6 +57,7 @@ struct scsi_pointer { > #define SCMD_TAGGED (1 << 0) > #define SCMD_UNCHECKED_ISA_DMA (1 << 1) > #define SCMD_INITIALIZED (1 << 2) > +#define SCMD_LAST (1 << 3) > /* flags preserved across unprep / reprep */ > #define SCMD_PRESERVED_FLAGS (SCMD_UNCHECKED_ISA_DMA | SCMD_INITIALIZED) > > diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h > index 2b539a1b3f62..28f1c9177cd2 100644 > --- a/include/scsi/scsi_host.h > +++ b/include/scsi/scsi_host.h > @@ -80,8 +80,10 @@ struct scsi_host_template { > * command block to the LLDD. When the driver finished > * processing the command the done callback is invoked. > * > - * If queuecommand returns 0, then the HBA has accepted the > - * command. The done() function must be called on the command > + * If queuecommand returns 0, then the driver has accepted the > + * command. It must also push it to the HBA if the scsi_cmnd > + * flag SCMD_LAST is set, or if the driver does not implement > + * commit_rqs. The done() function must be called on the command > * when the driver has finished with it. (you may call done on the > * command before queuecommand returns, but in this case you > * *must* return 0 from queuecommand). > @@ -109,6 +111,16 @@ struct scsi_host_template { > */ > int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *); > > + /* > + * The commit_rqs function is used to trigger a hardware > + * doorbell after some requests have been queued with > + * queuecommand, when an error is encountered before sending > + * the request with SCMD_LAST set. > + * > + * STATUS: OPTIONAL > + */ > + void (*commit_rqs)(struct Scsi_Host *, u16); > + > /* > * This is an error handling strategy routine. You don't need to > * define one of these if you don't want to - there is a default > I'm a bit unsure if 'bd->last' is always set; it's quite obvious that it's present if set, but what about requests with 'bd->last == false' ? Is there a guarantee that they will _always_ be followed with a request with bd->last == true? And if so, is there a guarantee that this request is part of the same batch? Aside from it: I think it's a good idea to match the '->last' setting onto the SCMD_LAST flag; I would even go so far and make this an independent patch. Once to above points are cleared, that is. But if that one is in, why do we need to have the separate 'commit_rqs' callback? Can't we let the driver decide to issue a doorbell kick (or whatever the driver decides to do there)? If we ensure that the SCMD_LAST flag is always set for the end of a batch (even if this batch consists only of one request), the driver simply can evaluate the flag and do its actions. Why do we need a new callback here? Cheers, Hannes
On 19/06/19 10:11, Hannes Reinecke wrote: > On 5/30/19 1:28 PM, Paolo Bonzini wrote: >> This allows a list of requests to be issued, with the LLD only writing >> the hardware doorbell when necessary, after the last request was prepared. >> This is more efficient if we have lists of requests to issue, particularly >> on virtualized hardware, where writing the doorbell is more expensive than >> on real hardware. >> >> The use case for this is plugged IO, where blk-mq flushes a batch of >> requests all at once. >> >> The API is the same as for blk-mq, just with blk-mq concepts tweaked to >> fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes >> a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is >> extracted from the hctx and passed as a parameter. >> >> The only complication is that blk-mq uses different plugging heuristics >> depending on whether commit_rqs is present or not. So we have two >> different sets of blk_mq_ops and pick one depending on whether the >> scsi_host template uses commit_rqs or not. >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> drivers/scsi/scsi_lib.c | 37 ++++++++++++++++++++++++++++++++++--- >> include/scsi/scsi_cmnd.h | 1 + >> include/scsi/scsi_host.h | 16 ++++++++++++++-- >> 3 files changed, 49 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >> index 601b9f1de267..eb4e67d02bfe 100644 >> --- a/drivers/scsi/scsi_lib.c >> +++ b/drivers/scsi/scsi_lib.c >> @@ -1673,10 +1673,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, >> blk_mq_start_request(req); >> } >> >> + cmd->flags &= SCMD_PRESERVED_FLAGS; >> if (sdev->simple_tags) >> cmd->flags |= SCMD_TAGGED; >> - else >> - cmd->flags &= ~SCMD_TAGGED; >> + if (bd->last) >> + cmd->flags |= SCMD_LAST; >> >> scsi_init_cmd_errh(cmd); >> cmd->scsi_done = scsi_mq_done; >> @@ -1807,10 +1808,37 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q) >> } >> EXPORT_SYMBOL_GPL(__scsi_init_queue); >> >> +static const struct blk_mq_ops scsi_mq_ops_no_commit = { >> + .get_budget = scsi_mq_get_budget, >> + .put_budget = scsi_mq_put_budget, >> + .queue_rq = scsi_queue_rq, >> + .complete = scsi_softirq_done, >> + .timeout = scsi_timeout, >> +#ifdef CONFIG_BLK_DEBUG_FS >> + .show_rq = scsi_show_rq, >> +#endif >> + .init_request = scsi_mq_init_request, >> + .exit_request = scsi_mq_exit_request, >> + .initialize_rq_fn = scsi_initialize_rq, >> + .busy = scsi_mq_lld_busy, >> + .map_queues = scsi_map_queues, >> +}; >> + >> + >> +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx) >> +{ >> + struct request_queue *q = hctx->queue; >> + struct scsi_device *sdev = q->queuedata; >> + struct Scsi_Host *shost = sdev->host; >> + >> + shost->hostt->commit_rqs(shost, hctx->queue_num); >> +} >> + >> static const struct blk_mq_ops scsi_mq_ops = { >> .get_budget = scsi_mq_get_budget, >> .put_budget = scsi_mq_put_budget, >> .queue_rq = scsi_queue_rq, >> + .commit_rqs = scsi_commit_rqs, >> .complete = scsi_softirq_done, >> .timeout = scsi_timeout, >> #ifdef CONFIG_BLK_DEBUG_FS >> @@ -1845,7 +1873,10 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) >> cmd_size += sizeof(struct scsi_data_buffer) + sgl_size; >> >> memset(&shost->tag_set, 0, sizeof(shost->tag_set)); >> - shost->tag_set.ops = &scsi_mq_ops; >> + if (shost->hostt->commit_rqs) >> + shost->tag_set.ops = &scsi_mq_ops; >> + else >> + shost->tag_set.ops = &scsi_mq_ops_no_commit; >> shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1; >> shost->tag_set.queue_depth = shost->can_queue; >> shost->tag_set.cmd_size = cmd_size; >> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h >> index 76ed5e4acd38..91bd749a02f7 100644 >> --- a/include/scsi/scsi_cmnd.h >> +++ b/include/scsi/scsi_cmnd.h >> @@ -57,6 +57,7 @@ struct scsi_pointer { >> #define SCMD_TAGGED (1 << 0) >> #define SCMD_UNCHECKED_ISA_DMA (1 << 1) >> #define SCMD_INITIALIZED (1 << 2) >> +#define SCMD_LAST (1 << 3) >> /* flags preserved across unprep / reprep */ >> #define SCMD_PRESERVED_FLAGS (SCMD_UNCHECKED_ISA_DMA | SCMD_INITIALIZED) >> >> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h >> index 2b539a1b3f62..28f1c9177cd2 100644 >> --- a/include/scsi/scsi_host.h >> +++ b/include/scsi/scsi_host.h >> @@ -80,8 +80,10 @@ struct scsi_host_template { >> * command block to the LLDD. When the driver finished >> * processing the command the done callback is invoked. >> * >> - * If queuecommand returns 0, then the HBA has accepted the >> - * command. The done() function must be called on the command >> + * If queuecommand returns 0, then the driver has accepted the >> + * command. It must also push it to the HBA if the scsi_cmnd >> + * flag SCMD_LAST is set, or if the driver does not implement >> + * commit_rqs. The done() function must be called on the command >> * when the driver has finished with it. (you may call done on the >> * command before queuecommand returns, but in this case you >> * *must* return 0 from queuecommand). >> @@ -109,6 +111,16 @@ struct scsi_host_template { >> */ >> int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *); >> >> + /* >> + * The commit_rqs function is used to trigger a hardware >> + * doorbell after some requests have been queued with >> + * queuecommand, when an error is encountered before sending >> + * the request with SCMD_LAST set. >> + * >> + * STATUS: OPTIONAL >> + */ >> + void (*commit_rqs)(struct Scsi_Host *, u16); >> + >> /* >> * This is an error handling strategy routine. You don't need to >> * define one of these if you don't want to - there is a default >> > I'm a bit unsure if 'bd->last' is always set; it's quite obvious that > it's present if set, but what about requests with 'bd->last == false' ? > Is there a guarantee that they will _always_ be followed with a request > with bd->last == true? > And if so, is there a guarantee that this request is part of the same batch? It's complicated. A request with bd->last == false _will_ always be followed by a request with bd->last == true in the same batch. However, due to e.g. errors it may be possible that the last request is not sent. In that case, the block layer sends commit_rqs, as documented in the comment above, to flush the requests that have been sent already. So, a driver that obeys bd->last (or SCMD_LAST) but does not implement commit_rqs is bound to have bugs, which is why this patch was not split further. Makes sense? Paolo > Aside from it: I think it's a good idea to match the '->last' setting > onto the SCMD_LAST flag; I would even go so far and make this an > independent patch. > Once to above points are cleared, that is. > > But if that one is in, why do we need to have the separate 'commit_rqs' > callback? > Can't we let the driver decide to issue a doorbell kick (or whatever the > driver decides to do there)? > If we ensure that the SCMD_LAST flag is always set for the end of a > batch (even if this batch consists only of one request), the driver > simply can evaluate the flag and do its actions. > Why do we need a new callback here? > > Cheers, > > Hannes >
On 19/06/19 12:31, Paolo Bonzini wrote: >> I'm a bit unsure if 'bd->last' is always set; it's quite obvious that >> it's present if set, but what about requests with 'bd->last == false' ? >> Is there a guarantee that they will _always_ be followed with a request >> with bd->last == true? >> And if so, is there a guarantee that this request is part of the same batch? > It's complicated. A request with bd->last == false _will_ always be > followed by a request with bd->last == true in the same batch. However, > due to e.g. errors it may be possible that the last request is not sent. > In that case, the block layer sends commit_rqs, as documented in the > comment above, to flush the requests that have been sent already. > > So, a driver that obeys bd->last (or SCMD_LAST) but does not implement > commit_rqs is bound to have bugs, which is why this patch was not split > further. > > Makes sense? Hannes, can you provide your Reviewed-by? Thanks, Paolo
On Mon, Jun 3, 2019 at 4:16 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 31/05/19 05:27, Ming Lei wrote: > > It should be fine to implement scsi_commit_rqs() as: > > > > if (shost->hostt->commit_rqs) > > shost->hostt->commit_rqs(shost, hctx->queue_num); > > > > then scsi_mq_ops_no_commit can be saved. > > > > Because .commit_rqs() is only called when BLK_STS_*_RESOURCE is > > returned from scsi_queue_rq(), at that time shost->hostt->commit_rqs should > > have been hit from cache given .queuecommand is called via > > host->hostt->queuecommand. > > This is not about d-cache, it's about preserving the heuristics that > blk-mq applies depending on whether commit_rqs is there or not. Fair enough, at least difference would be made by the check in blk_mq_make_request() if scsi_commit_rqs is provided unconditionally, so looks fine: Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming Lei
On 7/4/19 3:19 PM, Paolo Bonzini wrote: > On 19/06/19 12:31, Paolo Bonzini wrote: >>> I'm a bit unsure if 'bd->last' is always set; it's quite obvious that >>> it's present if set, but what about requests with 'bd->last == false' ? >>> Is there a guarantee that they will _always_ be followed with a request >>> with bd->last == true? >>> And if so, is there a guarantee that this request is part of the same batch? >> It's complicated. A request with bd->last == false _will_ always be >> followed by a request with bd->last == true in the same batch. However, >> due to e.g. errors it may be possible that the last request is not sent. >> In that case, the block layer sends commit_rqs, as documented in the >> comment above, to flush the requests that have been sent already. >> >> So, a driver that obeys bd->last (or SCMD_LAST) but does not implement >> commit_rqs is bound to have bugs, which is why this patch was not split >> further. >> >> Makes sense? > > Hannes, can you provide your Reviewed-by? > Well ... since you asked for it: Where is the 'commit_rqs' callback actually used? I seem to be going blind, but I can't find it; should be somewhere in the first patch, no? As per description: * The commit_rqs function is used to trigger a hardware * doorbell after some requests have been queued with * queuecommand, when an error is encountered before sending * the request with SCMD_LAST set. So it should be somewhere in the error path, probably scsi_error or something. But I don't seem to be able to find it ... Cheers, Hannes
On Fri, Jul 05, 2019 at 09:12:37AM +0200, Hannes Reinecke wrote: > On 7/4/19 3:19 PM, Paolo Bonzini wrote: > > On 19/06/19 12:31, Paolo Bonzini wrote: > >>> I'm a bit unsure if 'bd->last' is always set; it's quite obvious that > >>> it's present if set, but what about requests with 'bd->last == false' ? > >>> Is there a guarantee that they will _always_ be followed with a request > >>> with bd->last == true? > >>> And if so, is there a guarantee that this request is part of the same batch? > >> It's complicated. A request with bd->last == false _will_ always be > >> followed by a request with bd->last == true in the same batch. However, > >> due to e.g. errors it may be possible that the last request is not sent. > >> In that case, the block layer sends commit_rqs, as documented in the > >> comment above, to flush the requests that have been sent already. > >> > >> So, a driver that obeys bd->last (or SCMD_LAST) but does not implement > >> commit_rqs is bound to have bugs, which is why this patch was not split > >> further. > >> > >> Makes sense? > > > > Hannes, can you provide your Reviewed-by? > > > Well ... since you asked for it: > > Where is the 'commit_rqs' callback actually used? > I seem to be going blind, but I can't find it; should be somewhere in > the first patch, no? > As per description: > > * The commit_rqs function is used to trigger a hardware > * doorbell after some requests have been queued with > * queuecommand, when an error is encountered before sending > * the request with SCMD_LAST set. > > So it should be somewhere in the error path, probably scsi_error or > something. But I don't seem to be able to find it ... The block layer calls scsi_mq_ops->commit_rqs() from blk_mq_dispatch_rq_list() and blk_mq_try_issue_list_directly(). Stefan
On 7/5/19 9:44 AM, Stefan Hajnoczi wrote: > On Fri, Jul 05, 2019 at 09:12:37AM +0200, Hannes Reinecke wrote: >> On 7/4/19 3:19 PM, Paolo Bonzini wrote: >>> On 19/06/19 12:31, Paolo Bonzini wrote: >>>>> I'm a bit unsure if 'bd->last' is always set; it's quite obvious that >>>>> it's present if set, but what about requests with 'bd->last == false' ? >>>>> Is there a guarantee that they will _always_ be followed with a request >>>>> with bd->last == true? >>>>> And if so, is there a guarantee that this request is part of the same batch? >>>> It's complicated. A request with bd->last == false _will_ always be >>>> followed by a request with bd->last == true in the same batch. However, >>>> due to e.g. errors it may be possible that the last request is not sent. >>>> In that case, the block layer sends commit_rqs, as documented in the >>>> comment above, to flush the requests that have been sent already. >>>> >>>> So, a driver that obeys bd->last (or SCMD_LAST) but does not implement >>>> commit_rqs is bound to have bugs, which is why this patch was not split >>>> further. >>>> >>>> Makes sense? >>> >>> Hannes, can you provide your Reviewed-by? >>> >> Well ... since you asked for it: >> >> Where is the 'commit_rqs' callback actually used? >> I seem to be going blind, but I can't find it; should be somewhere in >> the first patch, no? >> As per description: >> >> * The commit_rqs function is used to trigger a hardware >> * doorbell after some requests have been queued with >> * queuecommand, when an error is encountered before sending >> * the request with SCMD_LAST set. >> >> So it should be somewhere in the error path, probably scsi_error or >> something. But I don't seem to be able to find it ... > > The block layer calls scsi_mq_ops->commit_rqs() from > blk_mq_dispatch_rq_list() and blk_mq_try_issue_list_directly(). > Ah, right. Now, then: Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 601b9f1de267..eb4e67d02bfe 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1673,10 +1673,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, blk_mq_start_request(req); } + cmd->flags &= SCMD_PRESERVED_FLAGS; if (sdev->simple_tags) cmd->flags |= SCMD_TAGGED; - else - cmd->flags &= ~SCMD_TAGGED; + if (bd->last) + cmd->flags |= SCMD_LAST; scsi_init_cmd_errh(cmd); cmd->scsi_done = scsi_mq_done; @@ -1807,10 +1808,37 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q) } EXPORT_SYMBOL_GPL(__scsi_init_queue); +static const struct blk_mq_ops scsi_mq_ops_no_commit = { + .get_budget = scsi_mq_get_budget, + .put_budget = scsi_mq_put_budget, + .queue_rq = scsi_queue_rq, + .complete = scsi_softirq_done, + .timeout = scsi_timeout, +#ifdef CONFIG_BLK_DEBUG_FS + .show_rq = scsi_show_rq, +#endif + .init_request = scsi_mq_init_request, + .exit_request = scsi_mq_exit_request, + .initialize_rq_fn = scsi_initialize_rq, + .busy = scsi_mq_lld_busy, + .map_queues = scsi_map_queues, +}; + + +static void scsi_commit_rqs(struct blk_mq_hw_ctx *hctx) +{ + struct request_queue *q = hctx->queue; + struct scsi_device *sdev = q->queuedata; + struct Scsi_Host *shost = sdev->host; + + shost->hostt->commit_rqs(shost, hctx->queue_num); +} + static const struct blk_mq_ops scsi_mq_ops = { .get_budget = scsi_mq_get_budget, .put_budget = scsi_mq_put_budget, .queue_rq = scsi_queue_rq, + .commit_rqs = scsi_commit_rqs, .complete = scsi_softirq_done, .timeout = scsi_timeout, #ifdef CONFIG_BLK_DEBUG_FS @@ -1845,7 +1873,10 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) cmd_size += sizeof(struct scsi_data_buffer) + sgl_size; memset(&shost->tag_set, 0, sizeof(shost->tag_set)); - shost->tag_set.ops = &scsi_mq_ops; + if (shost->hostt->commit_rqs) + shost->tag_set.ops = &scsi_mq_ops; + else + shost->tag_set.ops = &scsi_mq_ops_no_commit; shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1; shost->tag_set.queue_depth = shost->can_queue; shost->tag_set.cmd_size = cmd_size; diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index 76ed5e4acd38..91bd749a02f7 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -57,6 +57,7 @@ struct scsi_pointer { #define SCMD_TAGGED (1 << 0) #define SCMD_UNCHECKED_ISA_DMA (1 << 1) #define SCMD_INITIALIZED (1 << 2) +#define SCMD_LAST (1 << 3) /* flags preserved across unprep / reprep */ #define SCMD_PRESERVED_FLAGS (SCMD_UNCHECKED_ISA_DMA | SCMD_INITIALIZED) diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 2b539a1b3f62..28f1c9177cd2 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -80,8 +80,10 @@ struct scsi_host_template { * command block to the LLDD. When the driver finished * processing the command the done callback is invoked. * - * If queuecommand returns 0, then the HBA has accepted the - * command. The done() function must be called on the command + * If queuecommand returns 0, then the driver has accepted the + * command. It must also push it to the HBA if the scsi_cmnd + * flag SCMD_LAST is set, or if the driver does not implement + * commit_rqs. The done() function must be called on the command * when the driver has finished with it. (you may call done on the * command before queuecommand returns, but in this case you * *must* return 0 from queuecommand). @@ -109,6 +111,16 @@ struct scsi_host_template { */ int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *); + /* + * The commit_rqs function is used to trigger a hardware + * doorbell after some requests have been queued with + * queuecommand, when an error is encountered before sending + * the request with SCMD_LAST set. + * + * STATUS: OPTIONAL + */ + void (*commit_rqs)(struct Scsi_Host *, u16); + /* * This is an error handling strategy routine. You don't need to * define one of these if you don't want to - there is a default
This allows a list of requests to be issued, with the LLD only writing the hardware doorbell when necessary, after the last request was prepared. This is more efficient if we have lists of requests to issue, particularly on virtualized hardware, where writing the doorbell is more expensive than on real hardware. The use case for this is plugged IO, where blk-mq flushes a batch of requests all at once. The API is the same as for blk-mq, just with blk-mq concepts tweaked to fit the SCSI subsystem API: the "last" flag in blk_mq_queue_data becomes a flag in scsi_cmnd, while the queue_num in the commit_rqs callback is extracted from the hctx and passed as a parameter. The only complication is that blk-mq uses different plugging heuristics depending on whether commit_rqs is present or not. So we have two different sets of blk_mq_ops and pick one depending on whether the scsi_host template uses commit_rqs or not. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- drivers/scsi/scsi_lib.c | 37 ++++++++++++++++++++++++++++++++++--- include/scsi/scsi_cmnd.h | 1 + include/scsi/scsi_host.h | 16 ++++++++++++++-- 3 files changed, 49 insertions(+), 5 deletions(-)