diff mbox series

[1/1] pmem: allow user to set QUEUE_FLAG_NOWAIT

Message ID 20230512104302.8527-2-kch@nvidia.com (mailing list archive)
State New, archived
Headers show
Series pmem: allow user to set QUEUE_FLAG_NOWAIT | expand

Commit Message

Chaitanya Kulkarni May 12, 2023, 10:43 a.m. UTC
Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
parameter to retain the default behaviour. Also, update respective
allocation flags in the write path. Following are the performance
numbers with io_uring fio engine for random read, note that device has
been populated fully with randwrite workload before taking these
numbers :-

* linux-block (for-next) # grep IOPS  pmem*fio | column -t

nowait-off-1.fio:  read:  IOPS=3968k,  BW=15.1GiB/s
nowait-off-2.fio:  read:  IOPS=4084k,  BW=15.6GiB/s
nowait-off-3.fio:  read:  IOPS=3995k,  BW=15.2GiB/s

nowait-on-1.fio:   read:  IOPS=5909k,  BW=22.5GiB/s
nowait-on-2.fio:   read:  IOPS=5997k,  BW=22.9GiB/s
nowait-on-3.fio:   read:  IOPS=6006k,  BW=22.9GiB/s

* linux-block (for-next) # grep cpu  pmem*fio | column -t

nowait-off-1.fio:  cpu  :  usr=6.38%,   sys=31.37%,  ctx=220427659
nowait-off-2.fio:  cpu  :  usr=6.19%,   sys=31.45%,  ctx=229825635
nowait-off-3.fio:  cpu  :  usr=6.17%,   sys=31.22%,  ctx=221896158

nowait-on-1.fio:  cpu  :  usr=10.56%,  sys=87.82%,  ctx=24730   
nowait-on-2.fio:  cpu  :  usr=9.92%,   sys=88.36%,  ctx=23427   
nowait-on-3.fio:  cpu  :  usr=9.85%,   sys=89.04%,  ctx=23237   

* linux-block (for-next) # grep slat  pmem*fio | column -t
nowait-off-1.fio:  slat  (nsec):  min=431,   max=50423k,  avg=9424.06
nowait-off-2.fio:  slat  (nsec):  min=420,   max=35992k,  avg=9193.94
nowait-off-3.fio:  slat  (nsec):  min=430,   max=40737k,  avg=9244.24

nowait-on-1.fio:   slat  (nsec):  min=1232,  max=40098k,  avg=7518.60
nowait-on-2.fio:   slat  (nsec):  min=1303,  max=52107k,  avg=7423.37
nowait-on-3.fio:   slat  (nsec):  min=1123,  max=40193k,  avg=7409.08

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
---
 drivers/nvdimm/pmem.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Ira Weiny May 12, 2023, 5:14 p.m. UTC | #1
Chaitanya Kulkarni wrote:
> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
> parameter to retain the default behaviour. Also, update respective
> allocation flags in the write path. Following are the performance
> numbers with io_uring fio engine for random read, note that device has
> been populated fully with randwrite workload before taking these
> numbers :-

I'm not seeing any comparison with/without the option you propose?  I
assume there is some performance improvement you are trying to show?

> 
> * linux-block (for-next) # grep IOPS  pmem*fio | column -t
> 
> nowait-off-1.fio:  read:  IOPS=3968k,  BW=15.1GiB/s
> nowait-off-2.fio:  read:  IOPS=4084k,  BW=15.6GiB/s
> nowait-off-3.fio:  read:  IOPS=3995k,  BW=15.2GiB/s
> 
> nowait-on-1.fio:   read:  IOPS=5909k,  BW=22.5GiB/s
> nowait-on-2.fio:   read:  IOPS=5997k,  BW=22.9GiB/s
> nowait-on-3.fio:   read:  IOPS=6006k,  BW=22.9GiB/s
> 
> * linux-block (for-next) # grep cpu  pmem*fio | column -t
> 
> nowait-off-1.fio:  cpu  :  usr=6.38%,   sys=31.37%,  ctx=220427659
> nowait-off-2.fio:  cpu  :  usr=6.19%,   sys=31.45%,  ctx=229825635
> nowait-off-3.fio:  cpu  :  usr=6.17%,   sys=31.22%,  ctx=221896158
> 
> nowait-on-1.fio:  cpu  :  usr=10.56%,  sys=87.82%,  ctx=24730   
> nowait-on-2.fio:  cpu  :  usr=9.92%,   sys=88.36%,  ctx=23427   
> nowait-on-3.fio:  cpu  :  usr=9.85%,   sys=89.04%,  ctx=23237   
> 
> * linux-block (for-next) # grep slat  pmem*fio | column -t
> nowait-off-1.fio:  slat  (nsec):  min=431,   max=50423k,  avg=9424.06
> nowait-off-2.fio:  slat  (nsec):  min=420,   max=35992k,  avg=9193.94
> nowait-off-3.fio:  slat  (nsec):  min=430,   max=40737k,  avg=9244.24
> 
> nowait-on-1.fio:   slat  (nsec):  min=1232,  max=40098k,  avg=7518.60
> nowait-on-2.fio:   slat  (nsec):  min=1303,  max=52107k,  avg=7423.37
> nowait-on-3.fio:   slat  (nsec):  min=1123,  max=40193k,  avg=7409.08
> 
> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
> ---
>  drivers/nvdimm/pmem.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index ceea55f621cc..38defe84de4c 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -31,6 +31,10 @@
>  #include "pfn.h"
>  #include "nd.h"
>  
> +static bool g_nowait;
> +module_param_named(nowait, g_nowait, bool, 0444);
> +MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False");

Module parameters should be avoided.  Since I'm not clear on the
performance benefit I can't comment on alternatives.  But I strongly
suspect that this choice is not going to be desired for all devices
always.

Ira

> +
>  static struct device *to_dev(struct pmem_device *pmem)
>  {
>  	/*
> @@ -543,6 +547,8 @@ static int pmem_attach_disk(struct device *dev,
>  	blk_queue_max_hw_sectors(q, UINT_MAX);
>  	blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
>  	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
> +	if (g_nowait)
> +		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
>  	if (pmem->pfn_flags & PFN_MAP)
>  		blk_queue_flag_set(QUEUE_FLAG_DAX, q);
>  
> -- 
> 2.40.0
>
Dan Williams May 12, 2023, 6:54 p.m. UTC | #2
Chaitanya Kulkarni wrote:
> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
> parameter to retain the default behaviour. Also, update respective
> allocation flags in the write path. Following are the performance
> numbers with io_uring fio engine for random read, note that device has
> been populated fully with randwrite workload before taking these
> numbers :-

Numbers look good. I see no reason for this to be optional. Just like
the brd driver always sets NOWAIT, so should pmem.

> 
> * linux-block (for-next) # grep IOPS  pmem*fio | column -t
> 
> nowait-off-1.fio:  read:  IOPS=3968k,  BW=15.1GiB/s
> nowait-off-2.fio:  read:  IOPS=4084k,  BW=15.6GiB/s
> nowait-off-3.fio:  read:  IOPS=3995k,  BW=15.2GiB/s
> 
> nowait-on-1.fio:   read:  IOPS=5909k,  BW=22.5GiB/s
> nowait-on-2.fio:   read:  IOPS=5997k,  BW=22.9GiB/s
> nowait-on-3.fio:   read:  IOPS=6006k,  BW=22.9GiB/s
> 
> * linux-block (for-next) # grep cpu  pmem*fio | column -t
> 
> nowait-off-1.fio:  cpu  :  usr=6.38%,   sys=31.37%,  ctx=220427659
> nowait-off-2.fio:  cpu  :  usr=6.19%,   sys=31.45%,  ctx=229825635
> nowait-off-3.fio:  cpu  :  usr=6.17%,   sys=31.22%,  ctx=221896158
> 
> nowait-on-1.fio:  cpu  :  usr=10.56%,  sys=87.82%,  ctx=24730   
> nowait-on-2.fio:  cpu  :  usr=9.92%,   sys=88.36%,  ctx=23427   
> nowait-on-3.fio:  cpu  :  usr=9.85%,   sys=89.04%,  ctx=23237   
> 
> * linux-block (for-next) # grep slat  pmem*fio | column -t
> nowait-off-1.fio:  slat  (nsec):  min=431,   max=50423k,  avg=9424.06
> nowait-off-2.fio:  slat  (nsec):  min=420,   max=35992k,  avg=9193.94
> nowait-off-3.fio:  slat  (nsec):  min=430,   max=40737k,  avg=9244.24
> 
> nowait-on-1.fio:   slat  (nsec):  min=1232,  max=40098k,  avg=7518.60
> nowait-on-2.fio:   slat  (nsec):  min=1303,  max=52107k,  avg=7423.37
> nowait-on-3.fio:   slat  (nsec):  min=1123,  max=40193k,  avg=7409.08

Any thoughts on why min latency went up?
Chaitanya Kulkarni May 13, 2023, 12:56 a.m. UTC | #3
On 5/12/23 10:14, Ira Weiny wrote:
> Chaitanya Kulkarni wrote:
>> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
>> parameter to retain the default behaviour. Also, update respective
>> allocation flags in the write path. Following are the performance
>> numbers with io_uring fio engine for random read, note that device has
>> been populated fully with randwrite workload before taking these
>> numbers :-
> I'm not seeing any comparison with/without the option you propose?  I
> assume there is some performance improvement you are trying to show?

not needed see below.

>
>> * linux-block (for-next) # grep IOPS  pmem*fio | column -t
>>
>> nowait-off-1.fio:  read:  IOPS=3968k,  BW=15.1GiB/s
>> nowait-off-2.fio:  read:  IOPS=4084k,  BW=15.6GiB/s
>> nowait-off-3.fio:  read:  IOPS=3995k,  BW=15.2GiB/s
>>
>> nowait-on-1.fio:   read:  IOPS=5909k,  BW=22.5GiB/s
>> nowait-on-2.fio:   read:  IOPS=5997k,  BW=22.9GiB/s
>> nowait-on-3.fio:   read:  IOPS=6006k,  BW=22.9GiB/s
>>
>> * linux-block (for-next) # grep cpu  pmem*fio | column -t
>>
>> nowait-off-1.fio:  cpu  :  usr=6.38%,   sys=31.37%,  ctx=220427659
>> nowait-off-2.fio:  cpu  :  usr=6.19%,   sys=31.45%,  ctx=229825635
>> nowait-off-3.fio:  cpu  :  usr=6.17%,   sys=31.22%,  ctx=221896158
>>
>> nowait-on-1.fio:  cpu  :  usr=10.56%,  sys=87.82%,  ctx=24730
>> nowait-on-2.fio:  cpu  :  usr=9.92%,   sys=88.36%,  ctx=23427
>> nowait-on-3.fio:  cpu  :  usr=9.85%,   sys=89.04%,  ctx=23237
>>
>> * linux-block (for-next) # grep slat  pmem*fio | column -t
>> nowait-off-1.fio:  slat  (nsec):  min=431,   max=50423k,  avg=9424.06
>> nowait-off-2.fio:  slat  (nsec):  min=420,   max=35992k,  avg=9193.94
>> nowait-off-3.fio:  slat  (nsec):  min=430,   max=40737k,  avg=9244.24
>>
>> nowait-on-1.fio:   slat  (nsec):  min=1232,  max=40098k,  avg=7518.60
>> nowait-on-2.fio:   slat  (nsec):  min=1303,  max=52107k,  avg=7423.37
>> nowait-on-3.fio:   slat  (nsec):  min=1123,  max=40193k,  avg=7409.08
>>
>> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
>> ---
>>   drivers/nvdimm/pmem.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
>> index ceea55f621cc..38defe84de4c 100644
>> --- a/drivers/nvdimm/pmem.c
>> +++ b/drivers/nvdimm/pmem.c
>> @@ -31,6 +31,10 @@
>>   #include "pfn.h"
>>   #include "nd.h"
>>   
>> +static bool g_nowait;
>> +module_param_named(nowait, g_nowait, bool, 0444);
>> +MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False");
> Module parameters should be avoided.  Since I'm not clear on the
> performance benefit I can't comment on alternatives.  But I strongly
> suspect that this choice is not going to be desired for all devices
> always.
>
> Ira

me neither that is why I've added since I don't have access to
all the devices and I cannot cover all the cases to generate
quantitative data, sending out v2 without mod param.

-ck
Chaitanya Kulkarni May 13, 2023, 12:58 a.m. UTC | #4
On 5/12/23 11:54, Dan Williams wrote:
> Chaitanya Kulkarni wrote:
>> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
>> parameter to retain the default behaviour. Also, update respective
>> allocation flags in the write path. Following are the performance
>> numbers with io_uring fio engine for random read, note that device has
>> been populated fully with randwrite workload before taking these
>> numbers :-
> Numbers look good. I see no reason for this to be optional. Just like
> the brd driver always sets NOWAIT, so should pmem.

yes, sending out v2 without mod param.

>
>> * linux-block (for-next) # grep IOPS  pmem*fio | column -t
>>
>> nowait-off-1.fio:  read:  IOPS=3968k,  BW=15.1GiB/s
>> nowait-off-2.fio:  read:  IOPS=4084k,  BW=15.6GiB/s
>> nowait-off-3.fio:  read:  IOPS=3995k,  BW=15.2GiB/s
>>
>> nowait-on-1.fio:   read:  IOPS=5909k,  BW=22.5GiB/s
>> nowait-on-2.fio:   read:  IOPS=5997k,  BW=22.9GiB/s
>> nowait-on-3.fio:   read:  IOPS=6006k,  BW=22.9GiB/s
>>
>> * linux-block (for-next) # grep cpu  pmem*fio | column -t
>>
>> nowait-off-1.fio:  cpu  :  usr=6.38%,   sys=31.37%,  ctx=220427659
>> nowait-off-2.fio:  cpu  :  usr=6.19%,   sys=31.45%,  ctx=229825635
>> nowait-off-3.fio:  cpu  :  usr=6.17%,   sys=31.22%,  ctx=221896158
>>
>> nowait-on-1.fio:  cpu  :  usr=10.56%,  sys=87.82%,  ctx=24730
>> nowait-on-2.fio:  cpu  :  usr=9.92%,   sys=88.36%,  ctx=23427
>> nowait-on-3.fio:  cpu  :  usr=9.85%,   sys=89.04%,  ctx=23237
>>
>> * linux-block (for-next) # grep slat  pmem*fio | column -t
>> nowait-off-1.fio:  slat  (nsec):  min=431,   max=50423k,  avg=9424.06
>> nowait-off-2.fio:  slat  (nsec):  min=420,   max=35992k,  avg=9193.94
>> nowait-off-3.fio:  slat  (nsec):  min=430,   max=40737k,  avg=9244.24
>>
>> nowait-on-1.fio:   slat  (nsec):  min=1232,  max=40098k,  avg=7518.60
>> nowait-on-2.fio:   slat  (nsec):  min=1303,  max=52107k,  avg=7423.37
>> nowait-on-3.fio:   slat  (nsec):  min=1123,  max=40193k,  avg=7409.08
> Any thoughts on why min latency went up?

really not sure why, will need some more time to investigate that ..

-ck
Jane Chu May 15, 2023, 7:54 p.m. UTC | #5
Hi,

Does it make sense to mark this patch a candidate for stable branch?

thanks!
-jane

On 5/12/2023 3:43 AM, Chaitanya Kulkarni wrote:
> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module
> parameter to retain the default behaviour. Also, update respective
> allocation flags in the write path. Following are the performance
> numbers with io_uring fio engine for random read, note that device has
> been populated fully with randwrite workload before taking these
> numbers :-
> 
> * linux-block (for-next) # grep IOPS  pmem*fio | column -t
> 
> nowait-off-1.fio:  read:  IOPS=3968k,  BW=15.1GiB/s
> nowait-off-2.fio:  read:  IOPS=4084k,  BW=15.6GiB/s
> nowait-off-3.fio:  read:  IOPS=3995k,  BW=15.2GiB/s
> 
> nowait-on-1.fio:   read:  IOPS=5909k,  BW=22.5GiB/s
> nowait-on-2.fio:   read:  IOPS=5997k,  BW=22.9GiB/s
> nowait-on-3.fio:   read:  IOPS=6006k,  BW=22.9GiB/s
> 
> * linux-block (for-next) # grep cpu  pmem*fio | column -t
> 
> nowait-off-1.fio:  cpu  :  usr=6.38%,   sys=31.37%,  ctx=220427659
> nowait-off-2.fio:  cpu  :  usr=6.19%,   sys=31.45%,  ctx=229825635
> nowait-off-3.fio:  cpu  :  usr=6.17%,   sys=31.22%,  ctx=221896158
> 
> nowait-on-1.fio:  cpu  :  usr=10.56%,  sys=87.82%,  ctx=24730
> nowait-on-2.fio:  cpu  :  usr=9.92%,   sys=88.36%,  ctx=23427
> nowait-on-3.fio:  cpu  :  usr=9.85%,   sys=89.04%,  ctx=23237
> 
> * linux-block (for-next) # grep slat  pmem*fio | column -t
> nowait-off-1.fio:  slat  (nsec):  min=431,   max=50423k,  avg=9424.06
> nowait-off-2.fio:  slat  (nsec):  min=420,   max=35992k,  avg=9193.94
> nowait-off-3.fio:  slat  (nsec):  min=430,   max=40737k,  avg=9244.24
> 
> nowait-on-1.fio:   slat  (nsec):  min=1232,  max=40098k,  avg=7518.60
> nowait-on-2.fio:   slat  (nsec):  min=1303,  max=52107k,  avg=7423.37
> nowait-on-3.fio:   slat  (nsec):  min=1123,  max=40193k,  avg=7409.08
> 
> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
> ---
>   drivers/nvdimm/pmem.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index ceea55f621cc..38defe84de4c 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -31,6 +31,10 @@
>   #include "pfn.h"
>   #include "nd.h"
>   
> +static bool g_nowait;
> +module_param_named(nowait, g_nowait, bool, 0444);
> +MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False");
> +
>   static struct device *to_dev(struct pmem_device *pmem)
>   {
>   	/*
> @@ -543,6 +547,8 @@ static int pmem_attach_disk(struct device *dev,
>   	blk_queue_max_hw_sectors(q, UINT_MAX);
>   	blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
>   	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
> +	if (g_nowait)
> +		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
>   	if (pmem->pfn_flags & PFN_MAP)
>   		blk_queue_flag_set(QUEUE_FLAG_DAX, q);
>
Dan Williams May 15, 2023, 11:53 p.m. UTC | #6
Jane Chu wrote:
> Hi,
> 
> Does it make sense to mark this patch a candidate for stable branch?

These numbers:

> * linux-block (for-next) # grep slat  pmem*fio | column -t
> nowait-off-1.fio:  slat  (nsec):  min=431,   max=50423k,  avg=9424.06
> nowait-off-2.fio:  slat  (nsec):  min=420,   max=35992k,  avg=9193.94
> nowait-off-3.fio:  slat  (nsec):  min=430,   max=40737k,  avg=9244.24
> 
> nowait-on-1.fio:   slat  (nsec):  min=1232,  max=40098k,  avg=7518.60
> nowait-on-2.fio:   slat  (nsec):  min=1303,  max=52107k,  avg=7423.37
> nowait-on-3.fio:   slat  (nsec):  min=1123,  max=40193k,  avg=7409.08

Show there is a potential for a regression for min latency. So I would
like to see this patch upstream and shipping for a while before flagging
it for backport.
Jane Chu May 16, 2023, 5:58 p.m. UTC | #7
On 5/15/2023 4:53 PM, Dan Williams wrote:
> Jane Chu wrote:
>> Hi,
>>
>> Does it make sense to mark this patch a candidate for stable branch?
> 
> These numbers:
> 
>> * linux-block (for-next) # grep slat  pmem*fio | column -t
>> nowait-off-1.fio:  slat  (nsec):  min=431,   max=50423k,  avg=9424.06
>> nowait-off-2.fio:  slat  (nsec):  min=420,   max=35992k,  avg=9193.94
>> nowait-off-3.fio:  slat  (nsec):  min=430,   max=40737k,  avg=9244.24
>>
>> nowait-on-1.fio:   slat  (nsec):  min=1232,  max=40098k,  avg=7518.60
>> nowait-on-2.fio:   slat  (nsec):  min=1303,  max=52107k,  avg=7423.37
>> nowait-on-3.fio:   slat  (nsec):  min=1123,  max=40193k,  avg=7409.08
> 
> Show there is a potential for a regression for min latency. So I would
> like to see this patch upstream and shipping for a while before flagging
> it for backport.

Good point, sorry I missed noticing that.

I'm curious at why the submission latency is ~3 times long with 
QUEUE_FLAG_NOWAIT.

thanks,
-jane
diff mbox series

Patch

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ceea55f621cc..38defe84de4c 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -31,6 +31,10 @@ 
 #include "pfn.h"
 #include "nd.h"
 
+static bool g_nowait;
+module_param_named(nowait, g_nowait, bool, 0444);
+MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False");
+
 static struct device *to_dev(struct pmem_device *pmem)
 {
 	/*
@@ -543,6 +547,8 @@  static int pmem_attach_disk(struct device *dev,
 	blk_queue_max_hw_sectors(q, UINT_MAX);
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
 	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
+	if (g_nowait)
+		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
 	if (pmem->pfn_flags & PFN_MAP)
 		blk_queue_flag_set(QUEUE_FLAG_DAX, q);