mbox series

[RFCv2,0/3] iomap: Add support for subpage dirty state tracking to improve write performance

Message ID cover.1675093524.git.ritesh.list@gmail.com (mailing list archive)
Headers show
Series iomap: Add support for subpage dirty state tracking to improve write performance | expand

Message

Ritesh Harjani (IBM) Jan. 30, 2023, 4:14 p.m. UTC
Hello All,

Please find the RFCv2 patchset which adds support for iomap subpage dirty state
tracking which improves write performance and should reduce the write
amplification problem on platforms with smaller filesystem blocksize compared
to pagesize.
E.g. On Power with 64k default pagesize and with 4k filesystem blocksize.

RFC -> RFCv2
=============
1. One of the key fix in v2 is that earlier when the folio gets marked as dirty,
   we were never marking the bits dirty in iop bitmap.
   This patch adds support for iomap_dirty_folio() as new ->dirty_folio() aops
   callback, which sets the dirty bitmap in iop and later call filemap_dirty_folio().
   This was one of the review comment that was discussed in RFC.

2. One of the other key fix identified in testing was that iop structure could
   get allocated at the time of the writeback if the folio is uptodate.
   (since it can get freed during memory pressure or during
   truncate_inode_partial_folio() in case of large folio). This could then cause
   nothing to get written if we have not marked the necessary bits as dirty in
   iop->state[]. Patch-1 & Patch-3 takes care of that.

TODOs
======
1. I still need to work on macros which we could declare and use for easy
   reference to uptodate/dirty bits in iop->state[] bitmap (based on previous
   review comments).

2. Test xfstests on other filesystems which are using the iomap buffered write
   path (gfs2, zonefs).

3. Latest performance testing with this patch series (I am not expecting any
   surprises here. The perf improvements should be more or less similar to rfc).

4. To address one of the todo in Patch-3. I think I missed to address it and
   noticed it only now before sending. But it should be easily addressable.
   I can address it in the next revision along with others.

5. To address one of the other review comments like what happens with a large
   folio. Can we limit the size of bitmaps if the folio is too large e.g. > 2MB.

   [RH] - I can start looking into this area too, if we think these patches
   are looking good. My preference would be to work on todos 1-4 as part of this
   patch series and take up bitmap optimization as a follow-up work for next
   part. Please do let me know your thoughts and suggestions on this.

Note: I have done a 4k bs test with auto group on Power with 64k pagesize and
I haven't found any surprises. I am also running a full bench of all tests with
x86 and 1k blocksize, but it still hasn't completed. I can update the results
once it completes.

Also as we discussed, all the dirty and uptodate bitmap tracking code for
iomap_page's state[] bitmap, is still contained within iomap buffered-io.c file.

I would appreciate any review comments/feedback and help on this work i.e.
adding subpage size dirty tracking to reduce write amplification problem and
improve buffered write performance. Kindly note that w/o these patches,
below type of workload gets severly impacted.


Performance Results from RFC [1]:
=================================
1. Performance testing of below fio workload reveals ~16x performance
improvement on nvme with XFS (4k blocksize) on Power (64K pagesize)
FIO reported write bw scores, improved from ~28 MBps to ~452 MBps.

<test_randwrite.fio>
[global]
	ioengine=psync
	rw=randwrite
	overwrite=1
	pre_read=1
	direct=0
	bs=4k
	size=1G
	dir=./
	numjobs=8
	fdatasync=1
	runtime=60
	iodepth=64
	group_reporting=1

[fio-run]

2. Also our internal performance team reported that this patch improves there
   database workload performance by around ~83% (with XFS on Power)

[1]: https://lore.kernel.org/linux-xfs/cover.1666928993.git.ritesh.list@gmail.com/


Ritesh Harjani (IBM) (3):
  iomap: Move creation of iomap_page early in __iomap_write_begin
  iomap: Change uptodate variable name to state
  iomap: Support subpage size dirty tracking to improve write performance

 fs/gfs2/aops.c         |   2 +-
 fs/iomap/buffered-io.c | 129 ++++++++++++++++++++++++++++++++++-------
 fs/xfs/xfs_aops.c      |   2 +-
 fs/zonefs/super.c      |   2 +-
 include/linux/iomap.h  |   1 +
 5 files changed, 111 insertions(+), 25 deletions(-)

--
2.39.1

Comments

Matthew Wilcox Jan. 30, 2023, 6:10 p.m. UTC | #1
On Mon, Jan 30, 2023 at 09:44:10PM +0530, Ritesh Harjani (IBM) wrote:
> TODOs
> ======
> 1. I still need to work on macros which we could declare and use for easy
>    reference to uptodate/dirty bits in iop->state[] bitmap (based on previous
>    review comments).

I'm not sure it was worth posting this series without doing this, tbh.

> 5. To address one of the other review comments like what happens with a large
>    folio. Can we limit the size of bitmaps if the folio is too large e.g. > 2MB.
> 
>    [RH] - I can start looking into this area too, if we think these patches
>    are looking good. My preference would be to work on todos 1-4 as part of this
>    patch series and take up bitmap optimization as a follow-up work for next
>    part. Please do let me know your thoughts and suggestions on this.

I was hoping to push you towards investigating a better data structure
than a bitmap.  I know a bitmap solves your immediate problem since
there are only 16 4kB blocks in a 64kB page, but in a linear-read
scenario, XFS is going to create large folios on POWER machines, all
the way up to 16MB IIUC.  Whatever your PMD page size is.  So you're
going to be exposed to this in some scenarios, even if you're not seeing
them in your current testing.
Ritesh Harjani (IBM) Jan. 30, 2023, 9:01 p.m. UTC | #2
On 23/01/30 06:10PM, Matthew Wilcox wrote:
> On Mon, Jan 30, 2023 at 09:44:10PM +0530, Ritesh Harjani (IBM) wrote:
> > TODOs
> > ======
> > 1. I still need to work on macros which we could declare and use for easy
> >    reference to uptodate/dirty bits in iop->state[] bitmap (based on previous
> >    review comments).
>
> I'm not sure it was worth posting this series without doing this, tbh.

Really sorry about that. Since there was a functionality changes in
this patches which were earlier missing from the last series that you pointed
out i.e. marking the bits dirty when the folio is marked dirty, along with one
other change which I mentioned in cover letter. So I thought of pushing these
changes to get some early review.

Sure, I will definitely work on it and will push out the next rev with these
changes included.

>
> > 5. To address one of the other review comments like what happens with a large
> >    folio. Can we limit the size of bitmaps if the folio is too large e.g. > 2MB.
> >
> >    [RH] - I can start looking into this area too, if we think these patches
> >    are looking good. My preference would be to work on todos 1-4 as part of this
> >    patch series and take up bitmap optimization as a follow-up work for next
> >    part. Please do let me know your thoughts and suggestions on this.
>
> I was hoping to push you towards investigating a better data structure
> than a bitmap. I know a bitmap solves your immediate problem since
> there are only 16 4kB blocks in a 64kB page, but in a linear-read
> scenario, XFS is going to create large folios on POWER machines, all
> the way up to 16MB IIUC.  Whatever your PMD page size is.  So you're
> going to be exposed to this in some scenarios, even if you're not seeing
> them in your current testing.

Got it!! Let me come back on this after giving some more thoughts.

-ritesh
Ritesh Harjani (IBM) Feb. 2, 2023, 4:45 a.m. UTC | #3
Hello All,

Thanks for the review and feedback on the patch series.
I am on travel starting today. If I get sometime in between I will try and work
on the rest of the review comments. Once I get back next week, I will prepare
to send out V3 with review comments addressed.

Thanks
-ritesh