Message ID | 20210503150333.130310-11-hare@suse.de (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | scsi: enabled reserved commands for LLDDs | expand |
On 5/3/21 8:03 AM, Hannes Reinecke wrote: > These commands are set aside before allocating the block-mq tag bitmap, > so they'll never show up as busy in the tag map. That doesn't sound correct to me. Should the above perhaps be changed into "blk_mq_start_request() is never called for internal commands so they'll never show up as busy in the tag map"? Thanks, Bart.
On 5/4/21 5:20 AM, Bart Van Assche wrote: > On 5/3/21 8:03 AM, Hannes Reinecke wrote: >> These commands are set aside before allocating the block-mq tag bitmap, >> so they'll never show up as busy in the tag map. > > That doesn't sound correct to me. Should the above perhaps be changed > into "blk_mq_start_request() is never called for internal commands so > they'll never show up as busy in the tag map"? > Yes, will do. Cheers, Hannes
On 04/05/2021 07:17, Hannes Reinecke wrote: > On 5/4/21 5:20 AM, Bart Van Assche wrote: >> On 5/3/21 8:03 AM, Hannes Reinecke wrote: >>> These commands are set aside before allocating the block-mq tag bitmap, >>> so they'll never show up as busy in the tag map. >> >> That doesn't sound correct to me. Should the above perhaps be changed >> into "blk_mq_start_request() is never called for internal commands so >> they'll never show up as busy in the tag map"? >> > Yes, will do. So why don't these - or shouldn't these - turn up in the busy tag map? One of the motivations to use these block requests for internal commands is that we can take advantage of the block layer handling for CPU hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight, then they would be missed in blk_mq_hctx_notify_offline() -> blk_mq_hctx_has_requests(), right? And who knows what else... Thanks, John
On 5/4/21 12:55 PM, John Garry wrote: > On 04/05/2021 07:17, Hannes Reinecke wrote: >> On 5/4/21 5:20 AM, Bart Van Assche wrote: >>> On 5/3/21 8:03 AM, Hannes Reinecke wrote: >>>> These commands are set aside before allocating the block-mq tag bitmap, >>>> so they'll never show up as busy in the tag map. >>> >>> That doesn't sound correct to me. Should the above perhaps be changed >>> into "blk_mq_start_request() is never called for internal commands so >>> they'll never show up as busy in the tag map"? >>> >> Yes, will do. > > So why don't these - or shouldn't these - turn up in the busy tag map? > > One of the motivations to use these block requests for internal commands > is that we can take advantage of the block layer handling for CPU > hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight, then > they would be missed in blk_mq_hctx_notify_offline() -> > blk_mq_hctx_has_requests(), right? And who knows what else... > Oh, but of course it's possible to call 'start' on these requests to have them counted in the busy map. I just didn't see the need for it until now, that's all. Cheers, Hannes
On 5/4/21 6:12 AM, Hannes Reinecke wrote: > On 5/4/21 12:55 PM, John Garry wrote: >> On 04/05/2021 07:17, Hannes Reinecke wrote: >>> On 5/4/21 5:20 AM, Bart Van Assche wrote: >>>> On 5/3/21 8:03 AM, Hannes Reinecke wrote: >>>>> These commands are set aside before allocating the block-mq tag >>>>> bitmap, >>>>> so they'll never show up as busy in the tag map. >>>> >>>> That doesn't sound correct to me. Should the above perhaps be changed >>>> into "blk_mq_start_request() is never called for internal commands so >>>> they'll never show up as busy in the tag map"? >>>> >>> Yes, will do. >> >> So why don't these - or shouldn't these - turn up in the busy tag map? >> >> One of the motivations to use these block requests for internal >> commands is that we can take advantage of the block layer handling for >> CPU hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight, >> then they would be missed in blk_mq_hctx_notify_offline() -> >> blk_mq_hctx_has_requests(), right? And who knows what else... >> > Oh, but of course it's possible to call 'start' on these requests to > have them counted in the busy map. > I just didn't see the need for it until now, that's all. This is possible but this will require careful review of at least the following code paths such that nothing unexpected happens for internal commands: * The SCSI timeout code. * All blk_mq_tagset_busy_iter() and scsi_host_busy_iter() callers. As an example, scsi_host_busy() must not include LLD-internal commands. Thanks, Bart.
On 5/4/21 6:59 PM, Bart Van Assche wrote: > On 5/4/21 6:12 AM, Hannes Reinecke wrote: >> On 5/4/21 12:55 PM, John Garry wrote: >>> On 04/05/2021 07:17, Hannes Reinecke wrote: >>>> On 5/4/21 5:20 AM, Bart Van Assche wrote: >>>>> On 5/3/21 8:03 AM, Hannes Reinecke wrote: >>>>>> These commands are set aside before allocating the block-mq tag >>>>>> bitmap, >>>>>> so they'll never show up as busy in the tag map. >>>>> >>>>> That doesn't sound correct to me. Should the above perhaps be changed >>>>> into "blk_mq_start_request() is never called for internal commands so >>>>> they'll never show up as busy in the tag map"? >>>>> >>>> Yes, will do. >>> >>> So why don't these - or shouldn't these - turn up in the busy tag map? >>> >>> One of the motivations to use these block requests for internal >>> commands is that we can take advantage of the block layer handling for >>> CPU hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight, >>> then they would be missed in blk_mq_hctx_notify_offline() -> >>> blk_mq_hctx_has_requests(), right? And who knows what else... >>> >> Oh, but of course it's possible to call 'start' on these requests to >> have them counted in the busy map. >> I just didn't see the need for it until now, that's all. > > This is possible but this will require careful review of at least the > following code paths such that nothing unexpected happens for internal > commands: > * The SCSI timeout code. > * All blk_mq_tagset_busy_iter() and scsi_host_busy_iter() callers. As an > example, scsi_host_busy() must not include LLD-internal commands. > Oh, _that_ is easy. These are reserved commands, which will have the last bool argument to the iter functions set to 'true'. bool (*fn)(struct scsi_cmnd *, void *, bool) So we just need to return from the iter if the last argument is true. Cheers, Hannes
On 5/3/21 8:03 AM, Hannes Reinecke wrote: > struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev, > unsigned int op, blk_mq_req_flags_t flags) > @@ -2005,6 +2009,10 @@ struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev, > > WARN_ON_ONCE(((op & REQ_OP_MASK) != REQ_OP_SCSI_IN) && > ((op & REQ_OP_MASK) != REQ_OP_SCSI_OUT)); > + > + if (sdev->host->nr_reserved_cmds) > + flags |= BLK_MQ_REQ_RESERVED; > + > rq = blk_mq_alloc_request(sdev->request_queue, op, flags); > if (IS_ERR(rq)) > return NULL; Can the if-statement be removed such that scsi_get_internal_cmd() fails if sdev->host->nr_reserved_cmds == 0? I'm concerned that otherwise it will be very hard to determine which requests are internal and which ones not from inside a blk_mq_tagset_busy_iter() callback. Thanks, Bart.
On 5/5/21 2:45 AM, Bart Van Assche wrote: > On 5/3/21 8:03 AM, Hannes Reinecke wrote: >> struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev, >> unsigned int op, blk_mq_req_flags_t flags) >> @@ -2005,6 +2009,10 @@ struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev, >> >> WARN_ON_ONCE(((op & REQ_OP_MASK) != REQ_OP_SCSI_IN) && >> ((op & REQ_OP_MASK) != REQ_OP_SCSI_OUT)); >> + >> + if (sdev->host->nr_reserved_cmds) >> + flags |= BLK_MQ_REQ_RESERVED; >> + >> rq = blk_mq_alloc_request(sdev->request_queue, op, flags); >> if (IS_ERR(rq)) >> return NULL; > > Can the if-statement be removed such that scsi_get_internal_cmd() fails > if sdev->host->nr_reserved_cmds == 0? I'm concerned that otherwise it > will be very hard to determine which requests are internal and which > ones not from inside a blk_mq_tagset_busy_iter() callback. > Original idea was that one could use scsi_get_internal_cmd() even with nr_reserved_cmds == 0, but you are right that this will probably just lead to confusion. Will be modifying it for the next round. Cheers, Hannes
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index 2f162603876f..661ed7696562 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -469,6 +469,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) if (sht->virt_boundary_mask) shost->virt_boundary_mask = sht->virt_boundary_mask; + if (sht->nr_reserved_cmds) + shost->nr_reserved_cmds = sht->nr_reserved_cmds; + device_initialize(&shost->shost_gendev); dev_set_name(&shost->shost_gendev, "host%d", shost->host_no); shost->shost_gendev.bus = &scsi_bus_type; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f83b04e49bae..3c83b0fabefb 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1971,7 +1971,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) tag_set->ops = &scsi_mq_ops_no_commit; tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1; tag_set->nr_maps = shost->nr_maps ? : 1; - tag_set->queue_depth = shost->can_queue; + tag_set->queue_depth = shost->can_queue + shost->nr_reserved_cmds; + tag_set->reserved_tags = shost->nr_reserved_cmds; tag_set->cmd_size = cmd_size; tag_set->numa_node = NUMA_NO_NODE; tag_set->flags = BLK_MQ_F_SHOULD_MERGE; @@ -1996,6 +1997,9 @@ void scsi_mq_destroy_tags(struct Scsi_Host *shost) * @flags: BLK_MQ_REQ_* flags, e.g. BLK_MQ_REQ_NOWAIT. * * Allocates a SCSI command for internal LLDD use. + * If 'nr_reserved_commands' is specified by the host the + * command will be allocated from the reserved tag pool; + * otherwise the normal tag pool will be used. */ struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev, unsigned int op, blk_mq_req_flags_t flags) @@ -2005,6 +2009,10 @@ struct scsi_cmnd *scsi_get_internal_cmd(struct scsi_device *sdev, WARN_ON_ONCE(((op & REQ_OP_MASK) != REQ_OP_SCSI_IN) && ((op & REQ_OP_MASK) != REQ_OP_SCSI_OUT)); + + if (sdev->host->nr_reserved_cmds) + flags |= BLK_MQ_REQ_RESERVED; + rq = blk_mq_alloc_request(sdev->request_queue, op, flags); if (IS_ERR(rq)) return NULL; diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index d5260d1b7b38..f4119999a402 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -371,6 +371,7 @@ static DEVICE_ATTR(eh_deadline, S_IRUGO | S_IWUSR, show_shost_eh_deadline, store shost_rd_attr(unique_id, "%u\n"); shost_rd_attr(cmd_per_lun, "%hd\n"); shost_rd_attr(can_queue, "%d\n"); +shost_rd_attr(nr_reserved_cmds, "%d\n"); shost_rd_attr(sg_tablesize, "%hu\n"); shost_rd_attr(sg_prot_tablesize, "%hu\n"); shost_rd_attr(unchecked_isa_dma, "%d\n"); @@ -422,6 +423,7 @@ static struct attribute *scsi_sysfs_shost_attrs[] = { &dev_attr_host_reset.attr, &dev_attr_eh_deadline.attr, &dev_attr_nr_hw_queues.attr, + &dev_attr_nr_reserved_cmds.attr, NULL }; diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index f115150559ca..0831b33ee186 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -367,10 +367,19 @@ struct scsi_host_template { /* * This determines if we will use a non-interrupt driven * or an interrupt driven scheme. It is set to the maximum number - * of simultaneous commands a single hw queue in HBA will accept. + * of simultaneous commands a single hw queue in HBA will accept + * excluding internal commands. */ int can_queue; + /* + * This determines how many commands the HBA will set aside + * for internal commands. This number will be added to + * @can_queue to calculate the maximum number of simultaneous + * commands sent to the host. + */ + int nr_reserved_cmds; + /* * In many instances, especially where disconnect / reconnect are * supported, our host also has an ID on the SCSI bus. If this is @@ -614,6 +623,11 @@ struct Scsi_Host { unsigned short max_cmd_len; int this_id; + + /* + * Number of commands this host can handle at the same time. + * This excludes reserved commands as specified by nr_reserved_cmds. + */ int can_queue; short cmd_per_lun; short unsigned int sg_tablesize; @@ -632,6 +646,12 @@ struct Scsi_Host { */ unsigned nr_hw_queues; unsigned nr_maps; + + /* + * Number of reserved commands to allocate, if any. + */ + unsigned nr_reserved_cmds; + unsigned active_mode:2; unsigned unchecked_isa_dma:1;