Message ID | 20230512104302.8527-2-kch@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | pmem: allow user to set QUEUE_FLAG_NOWAIT | expand |
Chaitanya Kulkarni wrote: > Allow user to set the QUEUE_FLAG_NOWAIT optionally using module > parameter to retain the default behaviour. Also, update respective > allocation flags in the write path. Following are the performance > numbers with io_uring fio engine for random read, note that device has > been populated fully with randwrite workload before taking these > numbers :- I'm not seeing any comparison with/without the option you propose? I assume there is some performance improvement you are trying to show? > > * linux-block (for-next) # grep IOPS pmem*fio | column -t > > nowait-off-1.fio: read: IOPS=3968k, BW=15.1GiB/s > nowait-off-2.fio: read: IOPS=4084k, BW=15.6GiB/s > nowait-off-3.fio: read: IOPS=3995k, BW=15.2GiB/s > > nowait-on-1.fio: read: IOPS=5909k, BW=22.5GiB/s > nowait-on-2.fio: read: IOPS=5997k, BW=22.9GiB/s > nowait-on-3.fio: read: IOPS=6006k, BW=22.9GiB/s > > * linux-block (for-next) # grep cpu pmem*fio | column -t > > nowait-off-1.fio: cpu : usr=6.38%, sys=31.37%, ctx=220427659 > nowait-off-2.fio: cpu : usr=6.19%, sys=31.45%, ctx=229825635 > nowait-off-3.fio: cpu : usr=6.17%, sys=31.22%, ctx=221896158 > > nowait-on-1.fio: cpu : usr=10.56%, sys=87.82%, ctx=24730 > nowait-on-2.fio: cpu : usr=9.92%, sys=88.36%, ctx=23427 > nowait-on-3.fio: cpu : usr=9.85%, sys=89.04%, ctx=23237 > > * linux-block (for-next) # grep slat pmem*fio | column -t > nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06 > nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94 > nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24 > > nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60 > nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37 > nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08 > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > --- > drivers/nvdimm/pmem.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > index ceea55f621cc..38defe84de4c 100644 > --- a/drivers/nvdimm/pmem.c > +++ b/drivers/nvdimm/pmem.c > @@ -31,6 +31,10 @@ > #include "pfn.h" > #include "nd.h" > > +static bool g_nowait; > +module_param_named(nowait, g_nowait, bool, 0444); > +MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False"); Module parameters should be avoided. Since I'm not clear on the performance benefit I can't comment on alternatives. But I strongly suspect that this choice is not going to be desired for all devices always. Ira > + > static struct device *to_dev(struct pmem_device *pmem) > { > /* > @@ -543,6 +547,8 @@ static int pmem_attach_disk(struct device *dev, > blk_queue_max_hw_sectors(q, UINT_MAX); > blk_queue_flag_set(QUEUE_FLAG_NONROT, q); > blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q); > + if (g_nowait) > + blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q); > if (pmem->pfn_flags & PFN_MAP) > blk_queue_flag_set(QUEUE_FLAG_DAX, q); > > -- > 2.40.0 >
Chaitanya Kulkarni wrote: > Allow user to set the QUEUE_FLAG_NOWAIT optionally using module > parameter to retain the default behaviour. Also, update respective > allocation flags in the write path. Following are the performance > numbers with io_uring fio engine for random read, note that device has > been populated fully with randwrite workload before taking these > numbers :- Numbers look good. I see no reason for this to be optional. Just like the brd driver always sets NOWAIT, so should pmem. > > * linux-block (for-next) # grep IOPS pmem*fio | column -t > > nowait-off-1.fio: read: IOPS=3968k, BW=15.1GiB/s > nowait-off-2.fio: read: IOPS=4084k, BW=15.6GiB/s > nowait-off-3.fio: read: IOPS=3995k, BW=15.2GiB/s > > nowait-on-1.fio: read: IOPS=5909k, BW=22.5GiB/s > nowait-on-2.fio: read: IOPS=5997k, BW=22.9GiB/s > nowait-on-3.fio: read: IOPS=6006k, BW=22.9GiB/s > > * linux-block (for-next) # grep cpu pmem*fio | column -t > > nowait-off-1.fio: cpu : usr=6.38%, sys=31.37%, ctx=220427659 > nowait-off-2.fio: cpu : usr=6.19%, sys=31.45%, ctx=229825635 > nowait-off-3.fio: cpu : usr=6.17%, sys=31.22%, ctx=221896158 > > nowait-on-1.fio: cpu : usr=10.56%, sys=87.82%, ctx=24730 > nowait-on-2.fio: cpu : usr=9.92%, sys=88.36%, ctx=23427 > nowait-on-3.fio: cpu : usr=9.85%, sys=89.04%, ctx=23237 > > * linux-block (for-next) # grep slat pmem*fio | column -t > nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06 > nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94 > nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24 > > nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60 > nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37 > nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08 Any thoughts on why min latency went up?
On 5/12/23 10:14, Ira Weiny wrote: > Chaitanya Kulkarni wrote: >> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module >> parameter to retain the default behaviour. Also, update respective >> allocation flags in the write path. Following are the performance >> numbers with io_uring fio engine for random read, note that device has >> been populated fully with randwrite workload before taking these >> numbers :- > I'm not seeing any comparison with/without the option you propose? I > assume there is some performance improvement you are trying to show? not needed see below. > >> * linux-block (for-next) # grep IOPS pmem*fio | column -t >> >> nowait-off-1.fio: read: IOPS=3968k, BW=15.1GiB/s >> nowait-off-2.fio: read: IOPS=4084k, BW=15.6GiB/s >> nowait-off-3.fio: read: IOPS=3995k, BW=15.2GiB/s >> >> nowait-on-1.fio: read: IOPS=5909k, BW=22.5GiB/s >> nowait-on-2.fio: read: IOPS=5997k, BW=22.9GiB/s >> nowait-on-3.fio: read: IOPS=6006k, BW=22.9GiB/s >> >> * linux-block (for-next) # grep cpu pmem*fio | column -t >> >> nowait-off-1.fio: cpu : usr=6.38%, sys=31.37%, ctx=220427659 >> nowait-off-2.fio: cpu : usr=6.19%, sys=31.45%, ctx=229825635 >> nowait-off-3.fio: cpu : usr=6.17%, sys=31.22%, ctx=221896158 >> >> nowait-on-1.fio: cpu : usr=10.56%, sys=87.82%, ctx=24730 >> nowait-on-2.fio: cpu : usr=9.92%, sys=88.36%, ctx=23427 >> nowait-on-3.fio: cpu : usr=9.85%, sys=89.04%, ctx=23237 >> >> * linux-block (for-next) # grep slat pmem*fio | column -t >> nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06 >> nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94 >> nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24 >> >> nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60 >> nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37 >> nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08 >> >> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> >> --- >> drivers/nvdimm/pmem.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c >> index ceea55f621cc..38defe84de4c 100644 >> --- a/drivers/nvdimm/pmem.c >> +++ b/drivers/nvdimm/pmem.c >> @@ -31,6 +31,10 @@ >> #include "pfn.h" >> #include "nd.h" >> >> +static bool g_nowait; >> +module_param_named(nowait, g_nowait, bool, 0444); >> +MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False"); > Module parameters should be avoided. Since I'm not clear on the > performance benefit I can't comment on alternatives. But I strongly > suspect that this choice is not going to be desired for all devices > always. > > Ira me neither that is why I've added since I don't have access to all the devices and I cannot cover all the cases to generate quantitative data, sending out v2 without mod param. -ck
On 5/12/23 11:54, Dan Williams wrote: > Chaitanya Kulkarni wrote: >> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module >> parameter to retain the default behaviour. Also, update respective >> allocation flags in the write path. Following are the performance >> numbers with io_uring fio engine for random read, note that device has >> been populated fully with randwrite workload before taking these >> numbers :- > Numbers look good. I see no reason for this to be optional. Just like > the brd driver always sets NOWAIT, so should pmem. yes, sending out v2 without mod param. > >> * linux-block (for-next) # grep IOPS pmem*fio | column -t >> >> nowait-off-1.fio: read: IOPS=3968k, BW=15.1GiB/s >> nowait-off-2.fio: read: IOPS=4084k, BW=15.6GiB/s >> nowait-off-3.fio: read: IOPS=3995k, BW=15.2GiB/s >> >> nowait-on-1.fio: read: IOPS=5909k, BW=22.5GiB/s >> nowait-on-2.fio: read: IOPS=5997k, BW=22.9GiB/s >> nowait-on-3.fio: read: IOPS=6006k, BW=22.9GiB/s >> >> * linux-block (for-next) # grep cpu pmem*fio | column -t >> >> nowait-off-1.fio: cpu : usr=6.38%, sys=31.37%, ctx=220427659 >> nowait-off-2.fio: cpu : usr=6.19%, sys=31.45%, ctx=229825635 >> nowait-off-3.fio: cpu : usr=6.17%, sys=31.22%, ctx=221896158 >> >> nowait-on-1.fio: cpu : usr=10.56%, sys=87.82%, ctx=24730 >> nowait-on-2.fio: cpu : usr=9.92%, sys=88.36%, ctx=23427 >> nowait-on-3.fio: cpu : usr=9.85%, sys=89.04%, ctx=23237 >> >> * linux-block (for-next) # grep slat pmem*fio | column -t >> nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06 >> nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94 >> nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24 >> >> nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60 >> nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37 >> nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08 > Any thoughts on why min latency went up? really not sure why, will need some more time to investigate that .. -ck
Hi, Does it make sense to mark this patch a candidate for stable branch? thanks! -jane On 5/12/2023 3:43 AM, Chaitanya Kulkarni wrote: > Allow user to set the QUEUE_FLAG_NOWAIT optionally using module > parameter to retain the default behaviour. Also, update respective > allocation flags in the write path. Following are the performance > numbers with io_uring fio engine for random read, note that device has > been populated fully with randwrite workload before taking these > numbers :- > > * linux-block (for-next) # grep IOPS pmem*fio | column -t > > nowait-off-1.fio: read: IOPS=3968k, BW=15.1GiB/s > nowait-off-2.fio: read: IOPS=4084k, BW=15.6GiB/s > nowait-off-3.fio: read: IOPS=3995k, BW=15.2GiB/s > > nowait-on-1.fio: read: IOPS=5909k, BW=22.5GiB/s > nowait-on-2.fio: read: IOPS=5997k, BW=22.9GiB/s > nowait-on-3.fio: read: IOPS=6006k, BW=22.9GiB/s > > * linux-block (for-next) # grep cpu pmem*fio | column -t > > nowait-off-1.fio: cpu : usr=6.38%, sys=31.37%, ctx=220427659 > nowait-off-2.fio: cpu : usr=6.19%, sys=31.45%, ctx=229825635 > nowait-off-3.fio: cpu : usr=6.17%, sys=31.22%, ctx=221896158 > > nowait-on-1.fio: cpu : usr=10.56%, sys=87.82%, ctx=24730 > nowait-on-2.fio: cpu : usr=9.92%, sys=88.36%, ctx=23427 > nowait-on-3.fio: cpu : usr=9.85%, sys=89.04%, ctx=23237 > > * linux-block (for-next) # grep slat pmem*fio | column -t > nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06 > nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94 > nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24 > > nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60 > nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37 > nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08 > > Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> > --- > drivers/nvdimm/pmem.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c > index ceea55f621cc..38defe84de4c 100644 > --- a/drivers/nvdimm/pmem.c > +++ b/drivers/nvdimm/pmem.c > @@ -31,6 +31,10 @@ > #include "pfn.h" > #include "nd.h" > > +static bool g_nowait; > +module_param_named(nowait, g_nowait, bool, 0444); > +MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False"); > + > static struct device *to_dev(struct pmem_device *pmem) > { > /* > @@ -543,6 +547,8 @@ static int pmem_attach_disk(struct device *dev, > blk_queue_max_hw_sectors(q, UINT_MAX); > blk_queue_flag_set(QUEUE_FLAG_NONROT, q); > blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q); > + if (g_nowait) > + blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q); > if (pmem->pfn_flags & PFN_MAP) > blk_queue_flag_set(QUEUE_FLAG_DAX, q); >
Jane Chu wrote: > Hi, > > Does it make sense to mark this patch a candidate for stable branch? These numbers: > * linux-block (for-next) # grep slat pmem*fio | column -t > nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06 > nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94 > nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24 > > nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60 > nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37 > nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08 Show there is a potential for a regression for min latency. So I would like to see this patch upstream and shipping for a while before flagging it for backport.
On 5/15/2023 4:53 PM, Dan Williams wrote: > Jane Chu wrote: >> Hi, >> >> Does it make sense to mark this patch a candidate for stable branch? > > These numbers: > >> * linux-block (for-next) # grep slat pmem*fio | column -t >> nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06 >> nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94 >> nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24 >> >> nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60 >> nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37 >> nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08 > > Show there is a potential for a regression for min latency. So I would > like to see this patch upstream and shipping for a while before flagging > it for backport. Good point, sorry I missed noticing that. I'm curious at why the submission latency is ~3 times long with QUEUE_FLAG_NOWAIT. thanks, -jane
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index ceea55f621cc..38defe84de4c 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -31,6 +31,10 @@ #include "pfn.h" #include "nd.h" +static bool g_nowait; +module_param_named(nowait, g_nowait, bool, 0444); +MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False"); + static struct device *to_dev(struct pmem_device *pmem) { /* @@ -543,6 +547,8 @@ static int pmem_attach_disk(struct device *dev, blk_queue_max_hw_sectors(q, UINT_MAX); blk_queue_flag_set(QUEUE_FLAG_NONROT, q); blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q); + if (g_nowait) + blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q); if (pmem->pfn_flags & PFN_MAP) blk_queue_flag_set(QUEUE_FLAG_DAX, q);
Allow user to set the QUEUE_FLAG_NOWAIT optionally using module parameter to retain the default behaviour. Also, update respective allocation flags in the write path. Following are the performance numbers with io_uring fio engine for random read, note that device has been populated fully with randwrite workload before taking these numbers :- * linux-block (for-next) # grep IOPS pmem*fio | column -t nowait-off-1.fio: read: IOPS=3968k, BW=15.1GiB/s nowait-off-2.fio: read: IOPS=4084k, BW=15.6GiB/s nowait-off-3.fio: read: IOPS=3995k, BW=15.2GiB/s nowait-on-1.fio: read: IOPS=5909k, BW=22.5GiB/s nowait-on-2.fio: read: IOPS=5997k, BW=22.9GiB/s nowait-on-3.fio: read: IOPS=6006k, BW=22.9GiB/s * linux-block (for-next) # grep cpu pmem*fio | column -t nowait-off-1.fio: cpu : usr=6.38%, sys=31.37%, ctx=220427659 nowait-off-2.fio: cpu : usr=6.19%, sys=31.45%, ctx=229825635 nowait-off-3.fio: cpu : usr=6.17%, sys=31.22%, ctx=221896158 nowait-on-1.fio: cpu : usr=10.56%, sys=87.82%, ctx=24730 nowait-on-2.fio: cpu : usr=9.92%, sys=88.36%, ctx=23427 nowait-on-3.fio: cpu : usr=9.85%, sys=89.04%, ctx=23237 * linux-block (for-next) # grep slat pmem*fio | column -t nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06 nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94 nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24 nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60 nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37 nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08 Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> --- drivers/nvdimm/pmem.c | 6 ++++++ 1 file changed, 6 insertions(+)