Message ID | 1566506543-1090-1-git-send-email-longli@linuxonhyperv.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] storvsc: setup 1:1 mapping between hardware queue and CPU queue | expand |
From: Long Li <longli@linuxonhyperv.com> Sent: Thursday, August 22, 2019 1:42 PM > > storvsc doesn't use a dedicated hardware queue for a given CPU queue. When > issuing I/O, it selects returning CPU (hardware queue) dynamically based on > vmbus channel usage across all channels. > > This patch advertises num_possible_cpus() as number of hardware queues. This > will have upper layer setup 1:1 mapping between hardware queue and CPU queue > and avoid unnecessary locking when issuing I/O. > > Changes: > v2: rely on default upper layer function to map queues. (suggested by Ming Lei > <tom.leiming@gmail.com>) > > Signed-off-by: Long Li <longli@microsoft.com> > --- > drivers/scsi/storvsc_drv.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index b89269120a2d..dfd3b76a4f89 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -1836,8 +1836,7 @@ static int storvsc_probe(struct hv_device *device, > /* > * Set the number of HW queues we are supporting. > */ > - if (stor_device->num_sc != 0) > - host->nr_hw_queues = stor_device->num_sc + 1; > + host->nr_hw_queues = num_possible_cpus(); For a lot of the VM sizes in Azure, num_possible_cpus() is 128, even if the VM has only 4 or 8 or some other smaller number of vCPUs. So I'm wondering if you really want num_present_cpus() here instead, which would include only the vCPUs that actually exist in the VM. Michael > > /* > * Set the error handler work queue. > -- > 2.17.1
>>>Subject: RE: [Patch v2] storvsc: setup 1:1 mapping between hardware >>>queue and CPU queue >>> >>>From: Long Li <longli@linuxonhyperv.com> Sent: Thursday, August 22, 2019 >>>1:42 PM >>>> >>>> storvsc doesn't use a dedicated hardware queue for a given CPU queue. >>>> When issuing I/O, it selects returning CPU (hardware queue) >>>> dynamically based on vmbus channel usage across all channels. >>>> >>>> This patch advertises num_possible_cpus() as number of hardware >>>> queues. This will have upper layer setup 1:1 mapping between hardware >>>> queue and CPU queue and avoid unnecessary locking when issuing I/O. >>>> >>>> Changes: >>>> v2: rely on default upper layer function to map queues. (suggested by >>>> Ming Lei >>>> <tom.leiming@gmail.com>) >>>> >>>> Signed-off-by: Long Li <longli@microsoft.com> >>>> --- >>>> drivers/scsi/storvsc_drv.c | 3 +-- >>>> 1 file changed, 1 insertion(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c >>>> index b89269120a2d..dfd3b76a4f89 100644 >>>> --- a/drivers/scsi/storvsc_drv.c >>>> +++ b/drivers/scsi/storvsc_drv.c >>>> @@ -1836,8 +1836,7 @@ static int storvsc_probe(struct hv_device >>>*device, >>>> /* >>>> * Set the number of HW queues we are supporting. >>>> */ >>>> - if (stor_device->num_sc != 0) >>>> - host->nr_hw_queues = stor_device->num_sc + 1; >>>> + host->nr_hw_queues = num_possible_cpus(); >>> >>>For a lot of the VM sizes in Azure, num_possible_cpus() is 128, even if the >>>VM has only 4 or 8 or some other smaller number of vCPUs. >>>So I'm wondering if you really want num_present_cpus() here instead, >>>which would include only the vCPUs that actually exist in the VM. I think reporting num_possible_cpus() doesn't do more harm or take more resources. Because block layer allocates map for all the possible CPUs. The actual mapping is done in blk_mq_map_queues(), and it iterates all the possible CPUs. If we report num_present_cpus(), the rest of the CPUs also need to be mapped. >>> >>>Michael >>> >>>> >>>> /* >>>> * Set the error handler work queue. >>>> -- >>>> 2.17.1
>>>Subject: RE: [Patch v2] storvsc: setup 1:1 mapping between hardware >>>queue and CPU queue >>> >>>>>>Subject: RE: [Patch v2] storvsc: setup 1:1 mapping between hardware >>>>>>queue and CPU queue >>>>>> >>>>>>From: Long Li <longli@linuxonhyperv.com> Sent: Thursday, August 22, >>>>>>2019 >>>>>>1:42 PM >>>>>>> >>>>>>> storvsc doesn't use a dedicated hardware queue for a given CPU >>>queue. >>>>>>> When issuing I/O, it selects returning CPU (hardware queue) >>>>>>> dynamically based on vmbus channel usage across all channels. >>>>>>> >>>>>>> This patch advertises num_possible_cpus() as number of hardware >>>>>>> queues. This will have upper layer setup 1:1 mapping between >>>>>>> hardware queue and CPU queue and avoid unnecessary locking when >>>issuing I/O. >>>>>>> >>>>>>> Changes: >>>>>>> v2: rely on default upper layer function to map queues. (suggested >>>>>>> by Ming Lei >>>>>>> <tom.leiming@gmail.com>) >>>>>>> >>>>>>> Signed-off-by: Long Li <longli@microsoft.com> >>>>>>> --- >>>>>>> drivers/scsi/storvsc_drv.c | 3 +-- >>>>>>> 1 file changed, 1 insertion(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/scsi/storvsc_drv.c >>>>>>> b/drivers/scsi/storvsc_drv.c index b89269120a2d..dfd3b76a4f89 >>>>>>> 100644 >>>>>>> --- a/drivers/scsi/storvsc_drv.c >>>>>>> +++ b/drivers/scsi/storvsc_drv.c >>>>>>> @@ -1836,8 +1836,7 @@ static int storvsc_probe(struct hv_device >>>>>>*device, >>>>>>> /* >>>>>>> * Set the number of HW queues we are supporting. >>>>>>> */ >>>>>>> - if (stor_device->num_sc != 0) >>>>>>> - host->nr_hw_queues = stor_device->num_sc + 1; >>>>>>> + host->nr_hw_queues = num_possible_cpus(); >>>>>> >>>>>>For a lot of the VM sizes in Azure, num_possible_cpus() is 128, even >>>>>>if the VM has only 4 or 8 or some other smaller number of vCPUs. >>>>>>So I'm wondering if you really want num_present_cpus() here instead, >>>>>>which would include only the vCPUs that actually exist in the VM. >>> >>>I think reporting num_possible_cpus() doesn't do more harm or take more >>>resources. Because block layer allocates map for all the possible CPUs. >>> >>>The actual mapping is done in blk_mq_map_queues(), and it iterates all the >>>possible CPUs. If we report num_present_cpus(), the rest of the CPUs also >>>need to be mapped. Actually I get your point, reporting num_present_cpus() will get less number of struct blk_mq_hw_ctx created. So it saves memory. If we don't plan to support adding/onlining CPUs, we should use num_present_cpus(). >>> >>>>>> >>>>>>Michael >>>>>> >>>>>>> >>>>>>> /* >>>>>>> * Set the error handler work queue. >>>>>>> -- >>>>>>> 2.17.1
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index b89269120a2d..dfd3b76a4f89 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1836,8 +1836,7 @@ static int storvsc_probe(struct hv_device *device, /* * Set the number of HW queues we are supporting. */ - if (stor_device->num_sc != 0) - host->nr_hw_queues = stor_device->num_sc + 1; + host->nr_hw_queues = num_possible_cpus(); /* * Set the error handler work queue.