mbox series

[PATCHv8,0/5] iomap: Add support for per-block dirty state to improve write performance

Message ID cover.1686050333.git.ritesh.list@gmail.com (mailing list archive)
Headers show
Series iomap: Add support for per-block dirty state to improve write performance | expand

Message

Ritesh Harjani (IBM) June 6, 2023, 11:43 a.m. UTC
Hello All,

Please find PATCHv8 which adds per-block dirty tracking to iomap.
As discussed earlier this is required to improve write performance and reduce
write amplification for cases where either blocksize is less than pagesize (such
as Power platform with 64k pagesize) or when we have a large folio (such as xfs
which currently supports large folio).

v7 -> v8
==========
1. Renamed iomap_page -> iomap_folio & iop -> iof in Patch-1 itself.
2. Made changes to the naming conventions of iomap_iop_** function to take iop
   as a function argument (as Darrick suggested). Also changed
   iomap_iop_test_full_uptodate() -> iomap_iop_is_fully_uptodate() &
   iomap_iop_test_block_uptodate() -> iomap_iop_is_block_uptodate().
   (similary for dirty state as well)
3. Added comment in iomap_set_range_dirty() to explain why we pass inode
   argument in that.
4. Added Reviewed-by from Darrick in Patch-3 & Patch-4 as there were no changes
   in these patches in v8.

Thanks to all the review comments. I think the patch series looks in much better
shape now. Please do let me know if this looks good?

Patchv6 -> Patchv7
==================
1. Fixed __maybe_unused annotation.
2. Added this patch-4
   iomap: Refactor iomap_write_delalloc_punch() function out

RFCv5 -> PATCHv6:
=================
1. Addresses review comments from Brian, Christoph and Matthew.
   @Christoph:
     - I have renamed the higher level functions such as iop_alloc/iop_free() to
       iomap_iop_alloc/free() in v6.
     - As for the low level bitmap accessor functions I couldn't find any better
       naming then iop_test_/set/clear_**. I could have gone for
       iomap_iop__test/set/clear/_** or iomap__iop_test/set/clear_**, but
       I wasn't convinced with either of above as it also increases function
       name.
       Besides iop_test/set_clear_ accessors functions for uptodate and dirty
       status tracking make sense as we are sure we have a valid iop in such
       cases. Please do let me know if this looks ok to you.
2. I tried testing gfs2 (initially with no patches) with xfstests. But I always ended up
   in some or the other deadlock (I couldn't spend any time debugging that).
   I also ran it with -x log, but still it was always failing for me.
   @Andreas:
   - could you please suggest how can I test gfs2 with these patches. I see gfs2
     can have a smaller blocksize and it uses iomap buffered io path. It will be
     good if we can get these patches tested on it too.

3. I can now say I have run some good amount of fstests on these patches on
   these platforms and I haven't found any new failure in my testing so far.
   arm64 (64k pagesize): with 4k -g quick
   Power: with 4k -g auto
   x86: 1k, 4k with -g auto and adv_auto

From my testing so far these patches looks stable to me and if this looks good
to reviewers as well, do you think this can be queued to linux-next for wider
testing?

Performance numbers copied from last patch commit message
==================================================
Performance testing of below fio workload reveals ~16x performance
improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.

1. <test_randwrite.fio>
[global]
	ioengine=psync
	rw=randwrite
	overwrite=1
	pre_read=1
	direct=0
	bs=4k
	size=1G
	dir=./
	numjobs=8
	fdatasync=1
	runtime=60
	iodepth=64
	group_reporting=1

[fio-run]

2. Also our internal performance team reported that this patch improves
   their database workload performance by around ~83% (with XFS on Power)

Ritesh Harjani (IBM) (5):
  iomap: Rename iomap_page to iomap_folio and others
  iomap: Renames and refactor iomap_folio state bitmap handling
  iomap: Refactor iomap_write_delalloc_punch() function out
  iomap: Allocate iof in ->write_begin() early
  iomap: Add per-block dirty state tracking to improve performance

 fs/gfs2/aops.c         |   2 +-
 fs/iomap/buffered-io.c | 378 +++++++++++++++++++++++++++++------------
 fs/xfs/xfs_aops.c      |   2 +-
 fs/zonefs/file.c       |   2 +-
 include/linux/iomap.h  |   1 +
 5 files changed, 269 insertions(+), 116 deletions(-)

--
2.40.1

Comments

Matthew Wilcox June 6, 2023, 12:37 p.m. UTC | #1
On Tue, Jun 06, 2023 at 05:13:47PM +0530, Ritesh Harjani (IBM) wrote:
> Hello All,
> 
> Please find PATCHv8 which adds per-block dirty tracking to iomap.
> As discussed earlier this is required to improve write performance and reduce
> write amplification for cases where either blocksize is less than pagesize (such
> as Power platform with 64k pagesize) or when we have a large folio (such as xfs
> which currently supports large folio).

You're moving too fast.  Please, allow at least a few hours between
getting review comments and sending a new version.

> v7 -> v8
> ==========
> 1. Renamed iomap_page -> iomap_folio & iop -> iof in Patch-1 itself.

I don't think iomap_folio is the right name.  Indeed, I did not believe
that iomap_page was the right name.  As I said on #xfs recently ...

<willy> i'm still not crazy about iomap_page as the name of that
   data structure.  and calling the variable 'iop' just seems doomed
   to be trouble.  how do people feel about either iomap_block_state or
   folio_block_state ... or even just calling it block_state since it's
   local to iomap/buffered-io.c
<willy> we'd then call the variable either ibs or fbs, both of which
   have some collisions in the kernel, but none in filesystems
<dchinner> willy - sounds reasonable
Ritesh Harjani (IBM) June 6, 2023, 1 p.m. UTC | #2
Matthew Wilcox <willy@infradead.org> writes:

> On Tue, Jun 06, 2023 at 05:13:47PM +0530, Ritesh Harjani (IBM) wrote:
>> Hello All,
>> 
>> Please find PATCHv8 which adds per-block dirty tracking to iomap.
>> As discussed earlier this is required to improve write performance and reduce
>> write amplification for cases where either blocksize is less than pagesize (such
>> as Power platform with 64k pagesize) or when we have a large folio (such as xfs
>> which currently supports large folio).
>
> You're moving too fast.  Please, allow at least a few hours between
> getting review comments and sending a new version.
>

Sorry about that. I felt those were mainly only mechanical conversion
changes. Will keep in mind.

>> v7 -> v8
>> ==========
>> 1. Renamed iomap_page -> iomap_folio & iop -> iof in Patch-1 itself.
>
> I don't think iomap_folio is the right name.  Indeed, I did not believe
> that iomap_page was the right name.  As I said on #xfs recently ...
>
> <willy> i'm still not crazy about iomap_page as the name of that
>    data structure.  and calling the variable 'iop' just seems doomed
>    to be trouble.  how do people feel about either iomap_block_state or
>    folio_block_state ... or even just calling it block_state since it's
>    local to iomap/buffered-io.c
> <willy> we'd then call the variable either ibs or fbs, both of which
>    have some collisions in the kernel, but none in filesystems
> <dchinner> willy - sounds reasonable

Both seems equally reasonable to me. If others are equally ok with both,
then shall we go with iomap_block_state and ibs? 

I see that as "iomap_block_state" which is local to iomap buffered-io
layer to track per-block state within a folio and gets attached to
folio->private.

-ritesh
Darrick J. Wong June 6, 2023, 3:16 p.m. UTC | #3
On Tue, Jun 06, 2023 at 06:30:35PM +0530, Ritesh Harjani wrote:
> Matthew Wilcox <willy@infradead.org> writes:
> 
> > On Tue, Jun 06, 2023 at 05:13:47PM +0530, Ritesh Harjani (IBM) wrote:
> >> Hello All,
> >> 
> >> Please find PATCHv8 which adds per-block dirty tracking to iomap.
> >> As discussed earlier this is required to improve write performance and reduce
> >> write amplification for cases where either blocksize is less than pagesize (such
> >> as Power platform with 64k pagesize) or when we have a large folio (such as xfs
> >> which currently supports large folio).
> >
> > You're moving too fast.  Please, allow at least a few hours between
> > getting review comments and sending a new version.
> >
> 
> Sorry about that. I felt those were mainly only mechanical conversion
> changes. Will keep in mind.
> 
> >> v7 -> v8
> >> ==========
> >> 1. Renamed iomap_page -> iomap_folio & iop -> iof in Patch-1 itself.
> >
> > I don't think iomap_folio is the right name.  Indeed, I did not believe
> > that iomap_page was the right name.  As I said on #xfs recently ...
> >
> > <willy> i'm still not crazy about iomap_page as the name of that
> >    data structure.  and calling the variable 'iop' just seems doomed
> >    to be trouble.  how do people feel about either iomap_block_state or
> >    folio_block_state ... or even just calling it block_state since it's
> >    local to iomap/buffered-io.c
> > <willy> we'd then call the variable either ibs or fbs, both of which
> >    have some collisions in the kernel, but none in filesystems
> > <dchinner> willy - sounds reasonable
> 
> Both seems equally reasonable to me. If others are equally ok with both,
> then shall we go with iomap_block_state and ibs? 

I've a slight preference for struct folio_block_state/folio_bs/fbs.

Or even struct iomap_folio_state/iofs/ifs.

IBS is an acronym for a rather uncomfortable medical condition... ;)

--D

> I see that as "iomap_block_state" which is local to iomap buffered-io
> layer to track per-block state within a folio and gets attached to
> folio->private.
> 
> -ritesh
Christoph Hellwig June 7, 2023, 6:48 a.m. UTC | #4
FYI, daily reposts while there is still heavy discussion ongoing
makes it really hard to chime into the discussion if you get preempted..
Christoph Hellwig June 7, 2023, 6:54 a.m. UTC | #5
On Tue, Jun 06, 2023 at 01:37:48PM +0100, Matthew Wilcox wrote:
> > 1. Renamed iomap_page -> iomap_folio & iop -> iof in Patch-1 itself.
> 
> I don't think iomap_folio is the right name.  Indeed, I did not believe
> that iomap_page was the right name.  As I said on #xfs recently ...
> 
> <willy> i'm still not crazy about iomap_page as the name of that
>    data structure.  and calling the variable 'iop' just seems doomed
>    to be trouble.  how do people feel about either iomap_block_state or
>    folio_block_state ... or even just calling it block_state since it's
>    local to iomap/buffered-io.c
> <willy> we'd then call the variable either ibs or fbs, both of which
>    have some collisions in the kernel, but none in filesystems
> <dchinner> willy - sounds reasonable

I'd keep an iomap prefix, but block_state looks fine to me.  So
iomap_block_state would be my preference.
Ritesh Harjani (IBM) June 7, 2023, 10:22 a.m. UTC | #6
Christoph Hellwig <hch@infradead.org> writes:

> FYI, daily reposts while there is still heavy discussion ongoing
> makes it really hard to chime into the discussion if you get preempted..

Sure, noted. 
Thanks again for the help and review!

-ritesh