mbox series

[RFCv3,0/3] iomap: Add support for subpage dirty state tracking to improve write performance

Message ID cover.1677428794.git.ritesh.list@gmail.com (mailing list archive)
Headers show
Series iomap: Add support for subpage dirty state tracking to improve write performance | expand

Message

Ritesh Harjani (IBM) Feb. 26, 2023, 7:43 p.m. UTC
Hello All,

Please find the RFCv3 patchset which adds support for iomap subpage dirty state
tracking which improves write performance and should reduce the write
amplification problem on platforms with smaller filesystem blocksize compared
to pagesize.
E.g. On Power with 64k default pagesize and with 4k filesystem blocksize.

RFCv2 -> RFCv3
===============
1. Addressed review comments on adding accessor APIs for both uptodate and dirty
   iop bitmap. (todo-1 of rfcv2).
   Addressed few other review comments from Christoph & Matthew.
2. Performance testing of these patches reveal the same performance improvement
   i.e. the given fio workload shows 16x perf improvement on nvme drive.
   (completed todo-3 of rfcv2)
3. Addressed todo-4 of rfcv2

Few TODOs
===========
1. Test gfs2 and zonefs with these changes (todo-2 of rfcv2)
2. Look into todo-5 of rfcv2

xfstests testing with default options and 1k blocksize on x86 reveals no new
issues. Also didn't observe any surprises on Power with 4k blocksize.
(Please do suggest if there are any specific xfstests config options (for
xfs) which are good to get it tested for this patch series?)


Copy-Paste Cover letter of RFCv2
================================

RFC -> RFCv2
=============
1. One of the key fix in v2 is that earlier when the folio gets marked as dirty,
   we were never marking the bits dirty in iop bitmap.
   This patch adds support for iomap_dirty_folio() as new ->dirty_folio() aops
   callback, which sets the dirty bitmap in iop and later call filemap_dirty_folio().
   This was one of the review comment that was discussed in RFC.

2. One of the other key fix identified in testing was that iop structure could
   get allocated at the time of the writeback if the folio is uptodate.
   (since it can get freed during memory pressure or during
   truncate_inode_partial_folio() in case of large folio). This could then cause
   nothing to get written if we have not marked the necessary bits as dirty in
   iop->state[]. Patch-1 & Patch-3 takes care of that.

TODOs
======
1. I still need to work on macros which we could declare and use for easy
   reference to uptodate/dirty bits in iop->state[] bitmap (based on previous
   review comments).

2. Test xfstests on other filesystems which are using the iomap buffered write
   path (gfs2, zonefs).

3. Latest performance testing with this patch series (I am not expecting any
   surprises here. The perf improvements should be more or less similar to rfc).

4. To address one of the todo in Patch-3. I think I missed to address it and
   noticed it only now before sending. But it should be easily addressable.
   I can address it in the next revision along with others.

5. To address one of the other review comments like what happens with a large
   folio. Can we limit the size of bitmaps if the folio is too large e.g. > 2MB.

   [RH] - I can start looking into this area too, if we think these patches
   are looking good. My preference would be to work on todos 1-4 as part of this
   patch series and take up bitmap optimization as a follow-up work for next
   part. Please do let me know your thoughts and suggestions on this.

Note: I have done a 4k bs test with auto group on Power with 64k pagesize and
I haven't found any surprises. I am also running a full bench of all tests with
x86 and 1k blocksize, but it still hasn't completed. I can update the results
once it completes.

Also as we discussed, all the dirty and uptodate bitmap tracking code for
iomap_page's state[] bitmap, is still contained within iomap buffered-io.c file.

I would appreciate any review comments/feedback and help on this work i.e.
adding subpage size dirty tracking to reduce write amplification problem and
improve buffered write performance. Kindly note that w/o these patches,
below type of workload gets severly impacted.


Performance Results from RFC [1]:
=================================
1. Performance testing of below fio workload reveals ~16x performance
improvement on nvme with XFS (4k blocksize) on Power (64K pagesize)
FIO reported write bw scores, improved from ~28 MBps to ~452 MBps.

<test_randwrite.fio>
[global]
	ioengine=psync
	rw=randwrite
	overwrite=1
	pre_read=1
	direct=0
	bs=4k
	size=1G
	dir=./
	numjobs=8
	fdatasync=1
	runtime=60
	iodepth=64
	group_reporting=1

[fio-run]

2. Also our internal performance team reported that this patch improves there
   database workload performance by around ~83% (with XFS on Power)

[1]: https://lore.kernel.org/linux-xfs/cover.1666928993.git.ritesh.list@gmail.com/


Ritesh Harjani (IBM) (3):
  iomap: Allocate iop in ->write_begin() early
  iomap: Change uptodate variable name to state
  iomap: Support subpage size dirty tracking to improve write performance

 fs/gfs2/aops.c         |   2 +-
 fs/iomap/buffered-io.c | 166 ++++++++++++++++++++++++++++++++++++-----
 fs/xfs/xfs_aops.c      |   2 +-
 fs/zonefs/super.c      |   2 +-
 include/linux/iomap.h  |   1 +
 5 files changed, 150 insertions(+), 23 deletions(-)

--
2.39.2