diff mbox series

[v2,01/17] ibmvfc: add vhost fields and defaults for MQ enablement

Message ID 20201202005329.4538-2-tyreld@linux.ibm.com (mailing list archive)
State Superseded
Headers show
Series ibmvfc: initial MQ development | expand

Commit Message

Tyrel Datwyler Dec. 2, 2020, 12:53 a.m. UTC
Introduce several new vhost fields for managing MQ state of the adapter
as well as initial defaults for MQ enablement.

Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
---
 drivers/scsi/ibmvscsi/ibmvfc.c |  9 ++++++++-
 drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++++++++++--
 2 files changed, 19 insertions(+), 3 deletions(-)

Comments

Brian King Dec. 2, 2020, 3:14 p.m. UTC | #1
On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> Introduce several new vhost fields for managing MQ state of the adapter
> as well as initial defaults for MQ enablement.
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c |  9 ++++++++-
>  drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++++++++++--
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 42e4d35e0d35..f1d677a7423d 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>  	}
>  
>  	shost->transportt = ibmvfc_transport_template;
> -	shost->can_queue = max_requests;
> +	shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);

This doesn't look right. can_queue is the SCSI host queue depth, not the MQ queue depth.

>  	shost->max_lun = max_lun;
>  	shost->max_id = max_targets;
>  	shost->max_sectors = IBMVFC_MAX_SECTORS;
>  	shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
>  	shost->unique_id = shost->host_no;
> +	shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
>  
>  	vhost = shost_priv(shost);
>  	INIT_LIST_HEAD(&vhost->sent);
Tyrel Datwyler Dec. 2, 2020, 5:27 p.m. UTC | #2
On 12/2/20 7:14 AM, Brian King wrote:
> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>> Introduce several new vhost fields for managing MQ state of the adapter
>> as well as initial defaults for MQ enablement.
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>> ---
>>  drivers/scsi/ibmvscsi/ibmvfc.c |  9 ++++++++-
>>  drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++++++++++--
>>  2 files changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index 42e4d35e0d35..f1d677a7423d 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>>  	}
>>  
>>  	shost->transportt = ibmvfc_transport_template;
>> -	shost->can_queue = max_requests;
>> +	shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
> 
> This doesn't look right. can_queue is the SCSI host queue depth, not the MQ queue depth.

Our max_requests is the total number commands allowed across all queues. From
what I understand is can_queue is the total number of commands in flight allowed
for each hw queue.

        /*
         * In scsi-mq mode, the number of hardware queues supported by the LLD.
         *
         * Note: it is assumed that each hardware queue has a queue depth of
         * can_queue. In other words, the total queue depth per host
         * is nr_hw_queues * can_queue. However, for when host_tagset is set,
         * the total queue depth is can_queue.
         */

We currently don't use the host wide shared tagset.

-Tyrel

> 
>>  	shost->max_lun = max_lun;
>>  	shost->max_id = max_targets;
>>  	shost->max_sectors = IBMVFC_MAX_SECTORS;
>>  	shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
>>  	shost->unique_id = shost->host_no;
>> +	shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
>>  
>>  	vhost = shost_priv(shost);
>>  	INIT_LIST_HEAD(&vhost->sent);
> 
> 
>
Brian King Dec. 4, 2020, 2:26 p.m. UTC | #3
On 12/2/20 11:27 AM, Tyrel Datwyler wrote:
> On 12/2/20 7:14 AM, Brian King wrote:
>> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>>> Introduce several new vhost fields for managing MQ state of the adapter
>>> as well as initial defaults for MQ enablement.
>>>
>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>>> ---
>>>  drivers/scsi/ibmvscsi/ibmvfc.c |  9 ++++++++-
>>>  drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++++++++++--
>>>  2 files changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> index 42e4d35e0d35..f1d677a7423d 100644
>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>>>  	}
>>>  
>>>  	shost->transportt = ibmvfc_transport_template;
>>> -	shost->can_queue = max_requests;
>>> +	shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
>>
>> This doesn't look right. can_queue is the SCSI host queue depth, not the MQ queue depth.
> 
> Our max_requests is the total number commands allowed across all queues. From
> what I understand is can_queue is the total number of commands in flight allowed
> for each hw queue.
> 
>         /*
>          * In scsi-mq mode, the number of hardware queues supported by the LLD.
>          *
>          * Note: it is assumed that each hardware queue has a queue depth of
>          * can_queue. In other words, the total queue depth per host
>          * is nr_hw_queues * can_queue. However, for when host_tagset is set,
>          * the total queue depth is can_queue.
>          */
> 
> We currently don't use the host wide shared tagset.

Ok. I missed that bit... In that case, since we allocate by default only 100
event structs. If we slice that across IBMVFC_SCSI_HW_QUEUES (16) queues, then
we end up with only about 6 commands that can be outstanding per queue,
which is going to really hurt performance... I'd suggest bumping up
IBMVFC_MAX_REQUESTS_DEFAULT from 100 to 1000 as a starting point.

Thanks,

Brian
Hannes Reinecke Dec. 7, 2020, 11:56 a.m. UTC | #4
On 12/4/20 3:26 PM, Brian King wrote:
> On 12/2/20 11:27 AM, Tyrel Datwyler wrote:
>> On 12/2/20 7:14 AM, Brian King wrote:
>>> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>>>> Introduce several new vhost fields for managing MQ state of the adapter
>>>> as well as initial defaults for MQ enablement.
>>>>
>>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>>>> ---
>>>>   drivers/scsi/ibmvscsi/ibmvfc.c |  9 ++++++++-
>>>>   drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++++++++++--
>>>>   2 files changed, 19 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>> index 42e4d35e0d35..f1d677a7423d 100644
>>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>>>>   	}
>>>>   
>>>>   	shost->transportt = ibmvfc_transport_template;
>>>> -	shost->can_queue = max_requests;
>>>> +	shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
>>>
>>> This doesn't look right. can_queue is the SCSI host queue depth, not the MQ queue depth.
>>
>> Our max_requests is the total number commands allowed across all queues. From
>> what I understand is can_queue is the total number of commands in flight allowed
>> for each hw queue.
>>
>>          /*
>>           * In scsi-mq mode, the number of hardware queues supported by the LLD.
>>           *
>>           * Note: it is assumed that each hardware queue has a queue depth of
>>           * can_queue. In other words, the total queue depth per host
>>           * is nr_hw_queues * can_queue. However, for when host_tagset is set,
>>           * the total queue depth is can_queue.
>>           */
>>
>> We currently don't use the host wide shared tagset.
> 
> Ok. I missed that bit... In that case, since we allocate by default only 100
> event structs. If we slice that across IBMVFC_SCSI_HW_QUEUES (16) queues, then
> we end up with only about 6 commands that can be outstanding per queue,
> which is going to really hurt performance... I'd suggest bumping up
> IBMVFC_MAX_REQUESTS_DEFAULT from 100 to 1000 as a starting point.
> 
Before doing that I'd rather use the host-wide shared tagset.
Increasing the number of requests will increase the memory footprint of 
the driver (as each request will be statically allocated).

Cheers,

Hannes
Tyrel Datwyler Dec. 8, 2020, 10:37 p.m. UTC | #5
On 12/7/20 3:56 AM, Hannes Reinecke wrote:
> On 12/4/20 3:26 PM, Brian King wrote:
>> On 12/2/20 11:27 AM, Tyrel Datwyler wrote:
>>> On 12/2/20 7:14 AM, Brian King wrote:
>>>> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>>>>> Introduce several new vhost fields for managing MQ state of the adapter
>>>>> as well as initial defaults for MQ enablement.
>>>>>
>>>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>>>>> ---
>>>>>   drivers/scsi/ibmvscsi/ibmvfc.c |  9 ++++++++-
>>>>>   drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++++++++++--
>>>>>   2 files changed, 19 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> index 42e4d35e0d35..f1d677a7423d 100644
>>>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const
>>>>> struct vio_device_id *id)
>>>>>       }
>>>>>         shost->transportt = ibmvfc_transport_template;
>>>>> -    shost->can_queue = max_requests;
>>>>> +    shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
>>>>
>>>> This doesn't look right. can_queue is the SCSI host queue depth, not the MQ
>>>> queue depth.
>>>
>>> Our max_requests is the total number commands allowed across all queues. From
>>> what I understand is can_queue is the total number of commands in flight allowed
>>> for each hw queue.
>>>
>>>          /*
>>>           * In scsi-mq mode, the number of hardware queues supported by the LLD.
>>>           *
>>>           * Note: it is assumed that each hardware queue has a queue depth of
>>>           * can_queue. In other words, the total queue depth per host
>>>           * is nr_hw_queues * can_queue. However, for when host_tagset is set,
>>>           * the total queue depth is can_queue.
>>>           */
>>>
>>> We currently don't use the host wide shared tagset.
>>
>> Ok. I missed that bit... In that case, since we allocate by default only 100
>> event structs. If we slice that across IBMVFC_SCSI_HW_QUEUES (16) queues, then
>> we end up with only about 6 commands that can be outstanding per queue,
>> which is going to really hurt performance... I'd suggest bumping up
>> IBMVFC_MAX_REQUESTS_DEFAULT from 100 to 1000 as a starting point.
>>
> Before doing that I'd rather use the host-wide shared tagset.
> Increasing the number of requests will increase the memory footprint of the
> driver (as each request will be statically allocated).
> 

In the case where we use host-wide how do I determine the queue depth per
hardware queue? Is is hypothetically can_queue or is it (can_queue /
nr_hw_queues)? We want to allocate an event pool per-queue which made sense
without host-wide tags since the queue depth per hw queue is exactly can_queue.

-Tyrel
John Garry Dec. 17, 2020, 11:49 a.m. UTC | #6
On 08/12/2020 22:37, Tyrel Datwyler wrote:
> On 12/7/20 3:56 AM, Hannes Reinecke wrote:
>> On 12/4/20 3:26 PM, Brian King wrote:
>>> On 12/2/20 11:27 AM, Tyrel Datwyler wrote:
>>>> On 12/2/20 7:14 AM, Brian King wrote:
>>>>> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>>>>>> Introduce several new vhost fields for managing MQ state of the adapter
>>>>>> as well as initial defaults for MQ enablement.
>>>>>>
>>>>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>>>>>> ---
>>>>>>    drivers/scsi/ibmvscsi/ibmvfc.c |  9 ++++++++-
>>>>>>    drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++++++++++--
>>>>>>    2 files changed, 19 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>>> index 42e4d35e0d35..f1d677a7423d 100644
>>>>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>>> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const
>>>>>> struct vio_device_id *id)
>>>>>>        }
>>>>>>          shost->transportt = ibmvfc_transport_template;
>>>>>> -    shost->can_queue = max_requests;
>>>>>> +    shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
>>>>>
>>>>> This doesn't look right. can_queue is the SCSI host queue depth, not the MQ
>>>>> queue depth.
>>>>
>>>> Our max_requests is the total number commands allowed across all queues. From
>>>> what I understand is can_queue is the total number of commands in flight allowed
>>>> for each hw queue.
>>>>
>>>>           /*
>>>>            * In scsi-mq mode, the number of hardware queues supported by the LLD.
>>>>            *
>>>>            * Note: it is assumed that each hardware queue has a queue depth of
>>>>            * can_queue. In other words, the total queue depth per host
>>>>            * is nr_hw_queues * can_queue. However, for when host_tagset is set,
>>>>            * the total queue depth is can_queue.
>>>>            */
>>>>
>>>> We currently don't use the host wide shared tagset.
>>>
>>> Ok. I missed that bit... In that case, since we allocate by default only 100
>>> event structs. If we slice that across IBMVFC_SCSI_HW_QUEUES (16) queues, then
>>> we end up with only about 6 commands that can be outstanding per queue,
>>> which is going to really hurt performance... I'd suggest bumping up
>>> IBMVFC_MAX_REQUESTS_DEFAULT from 100 to 1000 as a starting point.
>>>
>> Before doing that I'd rather use the host-wide shared tagset.
>> Increasing the number of requests will increase the memory footprint of the
>> driver (as each request will be statically allocated).

Exposing HW queues increases memory footprint as we allocate the static 
requests per HW queue ctx, regardless of shared hostwide tagset enabled 
or not. This could prob be improved.

>>
> 
> In the case where we use host-wide how do I determine the queue depth per
> hardware queue? Is is hypothetically can_queue or is it (can_queue /
> nr_hw_queues)? We want to allocate an event pool per-queue which made sense
> without host-wide tags since the queue depth per hw queue is exactly can_queue.
> 

Generally hw queue depth should be same as can_queue. And this applies 
when hostwide shared tags is enabled as well.

We do this for hisi_sas: the host can queue max 4096 commands over all 
queues, so we set .can_queue = 4096*, set HW queue depth = 4096, and set 
.host_tagset = 1.

* we need to reserve some commands for internal IO, so this is reduced a 
little

Thanks,
John
diff mbox series

Patch

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 42e4d35e0d35..f1d677a7423d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5161,12 +5161,13 @@  static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	}
 
 	shost->transportt = ibmvfc_transport_template;
-	shost->can_queue = max_requests;
+	shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
 	shost->max_lun = max_lun;
 	shost->max_id = max_targets;
 	shost->max_sectors = IBMVFC_MAX_SECTORS;
 	shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
 	shost->unique_id = shost->host_no;
+	shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
 
 	vhost = shost_priv(shost);
 	INIT_LIST_HEAD(&vhost->sent);
@@ -5178,6 +5179,12 @@  static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	vhost->partition_number = -1;
 	vhost->log_level = log_level;
 	vhost->task_set = 1;
+
+	vhost->mq_enabled = IBMVFC_MQ;
+	vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS;
+	vhost->using_channels = 0;
+	vhost->do_enquiry = 1;
+
 	strcpy(vhost->partition_name, "UNKNOWN");
 	init_waitqueue_head(&vhost->work_wait_q);
 	init_waitqueue_head(&vhost->init_wait_q);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 9d58cfd774d3..e095daada70e 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -41,16 +41,21 @@ 
 #define IBMVFC_DEFAULT_LOG_LEVEL	2
 #define IBMVFC_MAX_CDB_LEN		16
 #define IBMVFC_CLS3_ERROR		0
+#define IBMVFC_MQ			0
+#define IBMVFC_SCSI_CHANNELS		0
+#define IBMVFC_SCSI_HW_QUEUES		1
+#define IBMVFC_MIG_NO_SUB_TO_CRQ	0
+#define IBMVFC_MIG_NO_N_TO_M		0
 
 /*
  * Ensure we have resources for ERP and initialization:
- * 1 for ERP
  * 1 for initialization
  * 1 for NPIV Logout
  * 2 for BSG passthru
  * 2 for each discovery thread
+ * 1 ERP for each possible HW Queue
  */
-#define IBMVFC_NUM_INTERNAL_REQ	(1 + 1 + 1 + 2 + (disc_threads * 2))
+#define IBMVFC_NUM_INTERNAL_REQ	(1 + 1 + 2 + (disc_threads * 2) + IBMVFC_SCSI_HW_QUEUES)
 
 #define IBMVFC_MAD_SUCCESS		0x00
 #define IBMVFC_MAD_NOT_SUPPORTED	0xF1
@@ -826,6 +831,10 @@  struct ibmvfc_host {
 	int delay_init;
 	int scan_complete;
 	int logged_in;
+	int mq_enabled;
+	int using_channels;
+	int do_enquiry;
+	int client_scsi_channels;
 	int aborting_passthru;
 	int events_to_log;
 #define IBMVFC_AE_LINKUP	0x0001