Message ID | 20230512082958.6550-2-kch@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | zram: queue flag nowait and mior cleanup | expand |
On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote: > Allow user to set the QUEUE_FLAG_NOWAIT optionally using module > parameter to retain the default behaviour. Also, update respective > allocation flags in the write path. Following are the performance > numbers with io_uring fio engine for random read, note that device has > been populated fully with randwrite workload before taking these > numbers :- Why would you add a module option, except to make everyones life hell?
On 5/12/23 8:31 AM, Christoph Hellwig wrote: > On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote: >> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module >> parameter to retain the default behaviour. Also, update respective >> allocation flags in the write path. Following are the performance >> numbers with io_uring fio engine for random read, note that device has >> been populated fully with randwrite workload before taking these >> numbers :- > > Why would you add a module option, except to make everyones life hell? Yeah that makes no sense. Either the driver is nowait compatible and it should just set the flag, or it's not.
On 5/12/23 07:31, Christoph Hellwig wrote: > On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote: >> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module >> parameter to retain the default behaviour. Also, update respective >> allocation flags in the write path. Following are the performance >> numbers with io_uring fio engine for random read, note that device has >> been populated fully with randwrite workload before taking these >> numbers :- > Why would you add a module option, except to make everyones life hell? send v2 without modparam. -ck
On 5/12/23 07:34, Jens Axboe wrote: > On 5/12/23 8:31 AM, Christoph Hellwig wrote: >> On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote: >>> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module >>> parameter to retain the default behaviour. Also, update respective >>> allocation flags in the write path. Following are the performance >>> numbers with io_uring fio engine for random read, note that device has >>> been populated fully with randwrite workload before taking these >>> numbers :- >> Why would you add a module option, except to make everyones life hell? > Yeah that makes no sense. Either the driver is nowait compatible and > it should just set the flag, or it's not. > send v2 without modparam. -ck
Hi Jens/Christoph, On 5/12/23 18:06, Chaitanya Kulkarni wrote: > On 5/12/23 07:34, Jens Axboe wrote: >> On 5/12/23 8:31 AM, Christoph Hellwig wrote: >>> On Fri, May 12, 2023 at 01:29:56AM -0700, Chaitanya Kulkarni wrote: >>>> Allow user to set the QUEUE_FLAG_NOWAIT optionally using module >>>> parameter to retain the default behaviour. Also, update respective >>>> allocation flags in the write path. Following are the performance >>>> numbers with io_uring fio engine for random read, note that device has >>>> been populated fully with randwrite workload before taking these >>>> numbers :- >>> Why would you add a module option, except to make everyones life hell? >> Yeah that makes no sense. Either the driver is nowait compatible and >> it should just set the flag, or it's not. >> > send v2 without modparam. > > -ck Removed modparam v2 is ready to send, but I've few concerns enabling nowait unconditionally for zram :- From brd data [1] and zram data [2] from my setup :- IOPs (old->new) | sys cpu% (old->new) -------------------------------------------------- brd | 1.5x (3919 -> 5874) | 3x (29 -> 87) zram | 1.09x ( 29 -> 87) | 9x (11 -> 97) brd:- IOPs increased by ~1.5 times (50% up) sys CPU percentage increased by ~3.0 times (200% up) zram:- IOPs increased by ~1.09 times ( 9% up) sys CPU percentage increased by ~8.81 times (781% up) This comparison clearly demonstrates that zram experiences a much more substantial CPU load relative to the increase in IOPs compared to brd. Such a significant difference might suggest a potential CPU regression in zram ? Especially for zram, if applications are not expecting this high cpu usage then they we'll get regression reports with default nowait approach. How about we avoid something like this with one of the following options ? 1. Provide a fix with module parameter. (Already NACKed). 2. Allow user to configure nowait from command line using zramctl. Set QUEUE_FLAG_NOWAIT disabled by default. 3. Add a block layer generic sysfs attr nowait like nomerges, since similar changes I've posted for pmem [3] and bcache [4] have same issue. This generic way we will avoid duplicating code in driver and allows user freedom to set or unset based on their Set QUEUE_FLAG_NOWAIT disabled by default. or please suggest any other way you guys can think is appropriate ... -ck [1] brd nowait off vs nowait on :- linux-block (zram-nowait) # grep cpu brd-*fio | column -t brd-default-nowait-off-1.fio: cpu : usr=6.34%, sys=29.84%, ctx=216249754, brd-default-nowait-off-2.fio: cpu : usr=6.41%, sys=29.83%, ctx=217773657, brd-default-nowait-off-3.fio: cpu : usr=6.37%, sys=30.05%, ctx=222667703, brd-nowait-on-1.fio: cpu : usr=10.18%, sys=88.35%, ctx=23221, brd-nowait-on-2.fio: cpu : usr=10.02%, sys=86.82%, ctx=22396, brd-nowait-on-3.fio: cpu : usr=10.17%, sys=86.29%, ctx=22207, linux-block (zram-nowait) # grep IOPS brd-*fio | column -t brd-default-nowait-off-1.fio: read: IOPS=3872k, BW=14.8GiB/s brd-default-nowait-off-2.fio: read: IOPS=3933k, BW=15.0GiB/s brd-default-nowait-off-3.fio: read: IOPS=3953k, BW=15.1GiB/s brd-nowait-on-1.fio: read: IOPS=5884k, BW=22.4GiB/s brd-nowait-on-2.fio: read: IOPS=5870k, BW=22.4GiB/s brd-nowait-on-3.fio: read: IOPS=5870k, BW=22.4GiB/s linux-block (zram-nowait) # grep slat brd-*fio | column -t brd-default-nowait-off-1.fio: slat (nsec): min=440, max=56072k, avg=9579.17 brd-default-nowait-off-2.fio: slat (nsec): min=440, max=42743k, avg=9468.83 brd-default-nowait-off-3.fio: slat (nsec): min=431, max=32493k, avg=9532.96 brd-nowait-on-1.fio: slat (nsec): min=1523, max=37786k, avg=7596.58 brd-nowait-on-2.fio: slat (nsec): min=1503, max=40101k, avg=7612.64 brd-nowait-on-3.fio: slat (nsec): min=1463, max=37298k, avg=7610.89 [2] zram nowait off vs nowait on:- linux-block (zram-nowait) # grep IOPS zram-*fio | column -t zram-default-nowait-off-1.fio: read: IOPS=833k, BW=3254MiB/s zram-default-nowait-off-2.fio: read: IOPS=845k, BW=3301MiB/s zram-default-nowait-off-3.fio: read: IOPS=845k, BW=3301MiB/s zram-nowait-on-1.fio: read: IOPS=917k, BW=3582MiB/s zram-nowait-on-2.fio: read: IOPS=914k, BW=3569MiB/s zram-nowait-on-3.fio: read: IOPS=917k, BW=3581MiB/s linux-block (zram-nowait) # grep cpu zram-*fio | column -t zram-default-nowait-off-1.fio: cpu : usr=5.18%, sys=11.31%, ctx=39945072 zram-default-nowait-off-2.fio: cpu : usr=5.20%, sys=11.49%, ctx=40591907 zram-default-nowait-off-3.fio: cpu : usr=5.31%, sys=11.86%, ctx=40252142 zram-nowait-on-1.fio: cpu : usr=1.87%, sys=97.05%, ctx=24337 zram-nowait-on-2.fio: cpu : usr=1.83%, sys=97.20%, ctx=21452 zram-nowait-on-3.fio: cpu : usr=1.84%, sys=97.20%, ctx=21051 linux-block (zram-nowait) # grep slat zram-*fio | column -t zram-default-nowait-off-1.fio: slat (nsec): min=420, max=6960.6k, avg=1859.09 zram-default-nowait-off-2.fio: slat (nsec): min=411, max=5387.7k, avg=1848.79 zram-default-nowait-off-3.fio: slat (nsec): min=410, max=6914.2k, avg=1902.51 zram-nowait-on-1.fio: slat (nsec): min=1092, max=32268k, avg=51605.49 zram-nowait-on-2.fio: slat (nsec): min=1052, max=7676.3k, avg=51806.82 zram-nowait-on-3.fio: slat (nsec): min=1062, max=10444k, avg=51625.89 [3] https://lore.kernel.org/nvdimm/b90ff1ba-b22c-bb37-db0a-4c46bb5f2a06@nvidia.com/T/#t [4] https://marc.info/?l=linux-bcache&m=168388522305946&w=1 https://marc.info/?l=linux-bcache&m=168401821129652&w=2
On (23/05/16 05:51), Chaitanya Kulkarni wrote: > Removed modparam v2 is ready to send, but I've few concerns enabling > nowait unconditionally for zram :- > > From brd data [1] and zram data [2] from my setup :- > > IOPs (old->new) | sys cpu% (old->new) > -------------------------------------------------- > brd | 1.5x (3919 -> 5874) | 3x (29 -> 87) > zram | 1.09x ( 29 -> 87) | 9x (11 -> 97) > > brd:- > IOPs increased by ~1.5 times (50% up) > sys CPU percentage increased by ~3.0 times (200% up) > > zram:- > IOPs increased by ~1.09 times ( 9% up) > sys CPU percentage increased by ~8.81 times (781% up) > > This comparison clearly demonstrates that zram experiences a much more > substantial CPU load relative to the increase in IOPs compared to brd. > Such a significant difference might suggest a potential CPU regression > in zram ? > > Especially for zram, if applications are not expecting this high cpu > usage then they we'll get regression reports with default nowait > approach. How about we avoid something like this with one of the > following options ? Well, zram performs decompression/compression on the CPU (per-CPU crypto streams) for each IO operation, so zram IO is CPU intensive.
On 5/16/23 06:08, Sergey Senozhatsky wrote: > On (23/05/16 05:51), Chaitanya Kulkarni wrote: >> Removed modparam v2 is ready to send, but I've few concerns enabling >> nowait unconditionally for zram :- >> >> From brd data [1] and zram data [2] from my setup :- >> >> IOPs (old->new) | sys cpu% (old->new) >> -------------------------------------------------- >> brd | 1.5x (3919 -> 5874) | 3x (29 -> 87) >> zram | 1.09x ( 29 -> 87) | 9x (11 -> 97) >> >> brd:- >> IOPs increased by ~1.5 times (50% up) >> sys CPU percentage increased by ~3.0 times (200% up) >> >> zram:- >> IOPs increased by ~1.09 times ( 9% up) >> sys CPU percentage increased by ~8.81 times (781% up) >> >> This comparison clearly demonstrates that zram experiences a much more >> substantial CPU load relative to the increase in IOPs compared to brd. >> Such a significant difference might suggest a potential CPU regression >> in zram ? >> >> Especially for zram, if applications are not expecting this high cpu >> usage then they we'll get regression reports with default nowait >> approach. How about we avoid something like this with one of the >> following options ? > Well, zram performs decompression/compression on the CPU (per-CPU > crypto streams) for each IO operation, so zram IO is CPU intensive. and that is exactly I've raised this issue, are you okay with that ? I'll send V2 with nowait enabled by default .. -ck
On 5/16/23 2:41 PM, Chaitanya Kulkarni wrote: > On 5/16/23 06:08, Sergey Senozhatsky wrote: >> On (23/05/16 05:51), Chaitanya Kulkarni wrote: >>> Removed modparam v2 is ready to send, but I've few concerns enabling >>> nowait unconditionally for zram :- >>> >>> From brd data [1] and zram data [2] from my setup :- >>> >>> IOPs (old->new) | sys cpu% (old->new) >>> -------------------------------------------------- >>> brd | 1.5x (3919 -> 5874) | 3x (29 -> 87) >>> zram | 1.09x ( 29 -> 87) | 9x (11 -> 97) >>> >>> brd:- >>> IOPs increased by ~1.5 times (50% up) >>> sys CPU percentage increased by ~3.0 times (200% up) >>> >>> zram:- >>> IOPs increased by ~1.09 times ( 9% up) >>> sys CPU percentage increased by ~8.81 times (781% up) >>> >>> This comparison clearly demonstrates that zram experiences a much more >>> substantial CPU load relative to the increase in IOPs compared to brd. >>> Such a significant difference might suggest a potential CPU regression >>> in zram ? >>> >>> Especially for zram, if applications are not expecting this high cpu >>> usage then they we'll get regression reports with default nowait >>> approach. How about we avoid something like this with one of the >>> following options ? >> Well, zram performs decompression/compression on the CPU (per-CPU >> crypto streams) for each IO operation, so zram IO is CPU intensive. > > and that is exactly I've raised this issue, are you okay with that ? > I'll send V2 with nowait enabled by default .. Did you check that it's actually nowait sane to begin with? I spent literally 30 seconds on that when you sent the first patch, and the partial/sync path does not look kosher.
On 5/16/23 13:43, Jens Axboe wrote: > On 5/16/23 2:41 PM, Chaitanya Kulkarni wrote: >> On 5/16/23 06:08, Sergey Senozhatsky wrote: >>> On (23/05/16 05:51), Chaitanya Kulkarni wrote: >>>> Removed modparam v2 is ready to send, but I've few concerns enabling >>>> nowait unconditionally for zram :- >>>> >>>> From brd data [1] and zram data [2] from my setup :- >>>> >>>> IOPs (old->new) | sys cpu% (old->new) >>>> -------------------------------------------------- >>>> brd | 1.5x (3919 -> 5874) | 3x (29 -> 87) >>>> zram | 1.09x ( 29 -> 87) | 9x (11 -> 97) >>>> >>>> brd:- >>>> IOPs increased by ~1.5 times (50% up) >>>> sys CPU percentage increased by ~3.0 times (200% up) >>>> >>>> zram:- >>>> IOPs increased by ~1.09 times ( 9% up) >>>> sys CPU percentage increased by ~8.81 times (781% up) >>>> >>>> This comparison clearly demonstrates that zram experiences a much more >>>> substantial CPU load relative to the increase in IOPs compared to brd. >>>> Such a significant difference might suggest a potential CPU regression >>>> in zram ? >>>> >>>> Especially for zram, if applications are not expecting this high cpu >>>> usage then they we'll get regression reports with default nowait >>>> approach. How about we avoid something like this with one of the >>>> following options ? >>> Well, zram performs decompression/compression on the CPU (per-CPU >>> crypto streams) for each IO operation, so zram IO is CPU intensive. >> and that is exactly I've raised this issue, are you okay with that ? >> I'll send V2 with nowait enabled by default .. > Did you check that it's actually nowait sane to begin with? I spent > literally 30 seconds on that when you sent the first patch, and the > partial/sync path does not look kosher. > I did check for it and it needs a nowait sane preparation in V2 with something like this [1] plus returning error with bio_wouldblock_error() when we end up in read_from_bdev_sync() when it is not called from writeback_store(). (rough changes [1]) But after taking performance numbers repeatedly, the main concern I failed to resolve is above numbers & default enabling nowait for zram ... -ck [1] diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 08d198431a88..c2f154911cc0 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -55,7 +55,7 @@ static const struct block_device_operations zram_devops; static void zram_free_page(struct zram *zram, size_t index); static int zram_read_page(struct zram *zram, struct page *page, u32 index, - struct bio *parent, blk_opf_t opf); + struct bio *parent, bool nowait); static int zram_slot_trylock(struct zram *zram, u32 index) { @@ -690,7 +690,7 @@ static ssize_t writeback_store(struct device *dev, /* Need for hugepage writeback racing */ zram_set_flag(zram, index, ZRAM_IDLE); zram_slot_unlock(zram, index); - if (zram_read_page(zram, page, index, NULL, 0)) { + if (zram_read_page(zram, page, index, NULL, false)) { zram_slot_lock(zram, index); zram_clear_flag(zram, index, ZRAM_UNDER_WB); zram_clear_flag(zram, index, ZRAM_IDLE); @@ -772,6 +772,7 @@ struct zram_work { unsigned long entry; struct page *page; int error; + bool nowait; }; static void zram_sync_read(struct work_struct *work) @@ -779,8 +780,9 @@ static void zram_sync_read(struct work_struct *work) struct zram_work *zw = container_of(work, struct zram_work, work); struct bio_vec bv; struct bio bio; + blk_opf_t_nowait = zw->nowait ? REQ_NOWAIT : 0; - bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ); + bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ | nowait); bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9); __bio_add_page(&bio, zw->page, PAGE_SIZE, 0); zw->error = submit_bio_wait(&bio); @@ -792,18 +794,14 @@ static void zram_sync_read(struct work_struct *work) * use a worker thread context. */ static int read_from_bdev_sync(struct zram *zram, struct page *page, - unsigned long entry, blk_opf_t opf) + unsigned long entry, bool nowait) { struct zram_work work; work.page = page; work.zram = zram; work.entry = entry; - - if (opf & REQ_NOWAIT) { - zram_sync_read(&work); - return work.error; - } + work.nowait = nowait; INIT_WORK_ONSTACK(&work.work, zram_sync_read); queue_work(system_unbound_wq, &work.work); @@ -814,21 +812,21 @@ static int read_from_bdev_sync(struct zram *zram, struct page *page, } static int read_from_bdev(struct zram *zram, struct page *page, - unsigned long entry, struct bio *parent, blk_opf_t opf) + unsigned long entry, struct bio *parent, bool nowait) { atomic64_inc(&zram->stats.bd_reads); if (!parent) { if (WARN_ON_ONCE(!IS_ENABLED(ZRAM_PARTIAL_IO))) return -EIO; - return read_from_bdev_sync(zram, page, entry, opf); + return read_from_bdev_sync(zram, page, entry, nowait); } - read_from_bdev_async(zram, page, entry, parent, opf); + read_from_bdev_async(zram, page, entry, parent, nowait); return 0; } #else static inline void reset_bdev(struct zram *zram) {}; static int read_from_bdev(struct zram *zram, struct page *page, - unsigned long entry, struct bio *parent, blk_opf_t opf) + unsigned long entry, struct bio *parent, bool nowait) { return -EIO; } @@ -1361,7 +1359,7 @@ static int zram_read_from_zspool(struct zram *zram, struct page *page, } static int zram_read_page(struct zram *zram, struct page *page, u32 index, - struct bio *parent, blk_opf_t bi_opf) + struct bio *parent, bool nowait) { int ret; @@ -1378,7 +1376,7 @@ static int zram_read_page(struct zram *zram, struct page *page, u32 index, zram_slot_unlock(zram, index); ret = read_from_bdev(zram, page, zram_get_element(zram, index), - parent, bi_opf); + parent, nowait); } /* Should NEVER happen. Return bio error if it does. */ @@ -1395,13 +1393,14 @@ static int zram_read_page(struct zram *zram, struct page *page, u32 index, static int zram_bvec_read_partial(struct zram *zram, struct bio_vec *bvec, u32 index, int offset, struct bio *bio) { + bool nowait = bio->bi_opf & REQ_NOWAIT; struct page *page; int ret; - page = alloc_page(bio->bi_opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO); + page = alloc_page(nowait ? GFP_NOWAIT : GFP_NOIO); if (!page) return -ENOMEM; - ret = zram_read_page(zram, page, index, NULL, bio->bi_opf); + ret = zram_read_page(zram, page, index, NULL, nowait); if (likely(!ret)) memcpy_to_bvec(bvec, page_address(page) + offset); __free_page(page); @@ -1413,7 +1412,8 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec, { if (is_partial_io(bvec)) return zram_bvec_read_partial(zram, bvec, index, offset, bio); - return zram_read_page(zram, bvec->bv_page, index, bio, bio->bi_opf); + return zram_read_page(zram, bvec->bv_page, index, bio, + bio->bi_opf & REQ_NOWAIT); } static int zram_write_page(struct zram *zram, struct page *page, u32 index) @@ -1547,14 +1547,15 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index) static int zram_bvec_write_partial(struct zram *zram, struct bio_vec *bvec, u32 index, int offset, struct bio *bio) { + bool nowait = bio->bi_opf & REQ_NOWAIT; struct page *page; int ret; - page = alloc_page(bio->bi_opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO); + page = alloc_page(nowait ? GFP_NOWAIT : GFP_NOIO); if (!page) return -ENOMEM; - ret = zram_read_page(zram, page, index, bio, bio->bi_opf); + ret = zram_read_page(zram, page, index, bio, nowait); if (!ret) { memcpy_from_bvec(page_address(page) + offset, bvec); ret = zram_write_page(zram, page, index);
On 5/16/23 3:32?PM, Chaitanya Kulkarni wrote: > On 5/16/23 13:43, Jens Axboe wrote: >> On 5/16/23 2:41?PM, Chaitanya Kulkarni wrote: >>> On 5/16/23 06:08, Sergey Senozhatsky wrote: >>>> On (23/05/16 05:51), Chaitanya Kulkarni wrote: >>>>> Removed modparam v2 is ready to send, but I've few concerns enabling >>>>> nowait unconditionally for zram :- >>>>> >>>>> From brd data [1] and zram data [2] from my setup :- >>>>> >>>>> IOPs (old->new) | sys cpu% (old->new) >>>>> -------------------------------------------------- >>>>> brd | 1.5x (3919 -> 5874) | 3x (29 -> 87) >>>>> zram | 1.09x ( 29 -> 87) | 9x (11 -> 97) >>>>> >>>>> brd:- >>>>> IOPs increased by ~1.5 times (50% up) >>>>> sys CPU percentage increased by ~3.0 times (200% up) >>>>> >>>>> zram:- >>>>> IOPs increased by ~1.09 times ( 9% up) >>>>> sys CPU percentage increased by ~8.81 times (781% up) >>>>> >>>>> This comparison clearly demonstrates that zram experiences a much more >>>>> substantial CPU load relative to the increase in IOPs compared to brd. >>>>> Such a significant difference might suggest a potential CPU regression >>>>> in zram ? >>>>> >>>>> Especially for zram, if applications are not expecting this high cpu >>>>> usage then they we'll get regression reports with default nowait >>>>> approach. How about we avoid something like this with one of the >>>>> following options ? >>>> Well, zram performs decompression/compression on the CPU (per-CPU >>>> crypto streams) for each IO operation, so zram IO is CPU intensive. >>> and that is exactly I've raised this issue, are you okay with that ? >>> I'll send V2 with nowait enabled by default .. >> Did you check that it's actually nowait sane to begin with? I spent >> literally 30 seconds on that when you sent the first patch, and the >> partial/sync path does not look kosher. >> > > I did check for it and it needs a nowait sane preparation in V2 with > something like this [1] plus returning error with bio_wouldblock_error() That looks pretty awful... Things like: > @@ -779,8 +780,9 @@ static void zram_sync_read(struct work_struct *work) > struct zram_work *zw = container_of(work, struct zram_work, work); > struct bio_vec bv; > struct bio bio; > + blk_opf_t_nowait = zw->nowait ? REQ_NOWAIT : 0; > > - bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ); > + bio_init(&bio, zw->zram->bdev, &bv, 1, REQ_OP_READ | nowait); > bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9); > __bio_add_page(&bio, zw->page, PAGE_SIZE, 0); > zw->error = submit_bio_wait(&bio); > make absolutely no sense, setting REQ_NOWAIT and then waiting on IO completion. > when we end up in read_from_bdev_sync() when it is not called from > writeback_store(). (rough changes [1]) writeback_store() will never have nowait set. Outside of that, pushing the nowait handling to the workqueue makes no sense, that flag only applies for inline issue. > But after taking performance numbers repeatedly, the main concern I > failed to resolve is above numbers & default enabling nowait for > zram ... It's a choice you make if you do nowait IO, normal block IO would not change at all. So I think it's fine, but the driver needs to be in a sane state to support nowait to begin with.
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index f6d90f1ba5cf..b2e419f15f71 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -36,6 +36,10 @@ #include "zram_drv.h" +static bool g_nowait; +module_param_named(nowait, g_nowait, bool, 0444); +MODULE_PARM_DESC(nowait, "set QUEUE_FLAG_NOWAIT. Default: False"); + static DEFINE_IDR(zram_index_idr); /* idr index must be protected */ static DEFINE_MUTEX(zram_index_mutex); @@ -1540,9 +1544,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index) static int zram_bvec_write_partial(struct zram *zram, struct bio_vec *bvec, u32 index, int offset, struct bio *bio) { - struct page *page = alloc_page(GFP_NOIO); + struct page *page; int ret; + page = alloc_page(bio->bi_opf & REQ_NOWAIT ? GFP_NOWAIT : GFP_NOIO); if (!page) return -ENOMEM; @@ -2215,6 +2220,8 @@ static int zram_add(void) /* zram devices sort of resembles non-rotational disks */ blk_queue_flag_set(QUEUE_FLAG_NONROT, zram->disk->queue); blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, zram->disk->queue); + if (g_nowait) + blk_queue_flag_set(QUEUE_FLAG_NOWAIT, zram->disk->queue); /* * To ensure that we always get PAGE_SIZE aligned
Allow user to set the QUEUE_FLAG_NOWAIT optionally using module parameter to retain the default behaviour. Also, update respective allocation flags in the write path. Following are the performance numbers with io_uring fio engine for random read, note that device has been populated fully with randwrite workload before taking these numbers :- * linux-block (for-next) # grep IOPS zram*fio | column -t default-nowait-off-1.fio: read: IOPS=802k, BW=3133MiB/s default-nowait-off-2.fio: read: IOPS=796k, BW=3111MiB/s default-nowait-off-3.fio: read: IOPS=796k, BW=3108MiB/s nowait-on-1.fio: read: IOPS=857k, BW=3346MiB/s nowait-on-2.fio: read: IOPS=857k, BW=3347MiB/s nowait-on-3.fio: read: IOPS=858k, BW=3353MiB/s * linux-block (for-next) # grep cpu zram*fio | column -t default-nowait-off-1.fio: cpu : usr=5.82%, sys=13.54%, ctx=36301915 default-nowait-off-2.fio: cpu : usr=5.84%, sys=13.03%, ctx=37781937 default-nowait-off-3.fio: cpu : usr=5.84%, sys=12.90%, ctx=37492533 nowait-on-1.fio: cpu : usr=1.74%, sys=97.62%, ctx=24068, nowait-on-2.fio: cpu : usr=1.74%, sys=97.57%, ctx=24674, nowait-on-3.fio: cpu : usr=1.76%, sys=97.59%, ctx=24725, Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> --- drivers/block/zram/zram_drv.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)