mbox series

[v3,0/8] Create large folios in iomap buffered write path

Message ID 20230612203910.724378-1-willy@infradead.org (mailing list archive)
Headers show
Series Create large folios in iomap buffered write path | expand

Message

Matthew Wilcox (Oracle) June 12, 2023, 8:39 p.m. UTC
Commit ebb7fb1557b1 limited the length of ioend chains to 4096 entries
to improve worst-case latency.  Unfortunately, this had the effect of
limiting the performance of:

fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 \
        -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 \
        -numjobs=4 -directory=/mnt/test

The problem ends up being lock contention on the i_pages spinlock as we
clear the writeback bit on each folio (and propagate that up through
the tree).  By using larger folios, we decrease the number of folios
to be processed by a factor of 256 for this benchmark, eliminating the
lock contention.

It's also the right thing to do.  This is a project that has been on
the back burner for years, it just hasn't been important enough to do
before now.

I think it's probably best if this goes through the iomap tree since
the changes outside iomap are either to the page cache or they're
trivial.

v3:
 - Fix the handling of compound highmem pages in copy_page_from_iter_atomic()
 - Rename fgp_t to fgf_t
 - Clarify some wording in the documentation

v2:
 - Fix misplaced semicolon
 - Rename fgp_order to fgp_set_order
 - Rename FGP_ORDER to FGP_GET_ORDER
 - Add fgp_t
 - Update the documentation for ->release_folio
 - Fix iomap_invalidate_folio()
 - Update iomap_release_folio()

Matthew Wilcox (Oracle) (8):
  iov_iter: Handle compound highmem pages in
    copy_page_from_iter_atomic()
  iomap: Remove large folio handling in iomap_invalidate_folio()
  doc: Correct the description of ->release_folio
  iomap: Remove unnecessary test from iomap_release_folio()
  filemap: Add fgf_t typedef
  filemap: Allow __filemap_get_folio to allocate large folios
  iomap: Create large folios in the buffered write path
  iomap: Copy larger chunks from userspace

 Documentation/filesystems/locking.rst | 15 ++++--
 fs/btrfs/file.c                       |  6 +--
 fs/f2fs/compress.c                    |  2 +-
 fs/f2fs/f2fs.h                        |  2 +-
 fs/gfs2/bmap.c                        |  2 +-
 fs/iomap/buffered-io.c                | 43 ++++++++--------
 include/linux/iomap.h                 |  2 +-
 include/linux/pagemap.h               | 71 ++++++++++++++++++++++-----
 lib/iov_iter.c                        | 43 ++++++++++------
 mm/filemap.c                          | 61 ++++++++++++-----------
 mm/folio-compat.c                     |  2 +-
 mm/readahead.c                        | 13 -----
 12 files changed, 159 insertions(+), 103 deletions(-)

Comments

Wang Yugui June 21, 2023, 12:03 p.m. UTC | #1
Hi,

> Commit ebb7fb1557b1 limited the length of ioend chains to 4096 entries
> to improve worst-case latency.  Unfortunately, this had the effect of
> limiting the performance of:
> 
> fio -name write-bandwidth -rw=write -bs=1024Ki -size=32Gi -runtime=30 \
>         -iodepth 1 -ioengine sync -zero_buffers=1 -direct=0 -end_fsync=1 \
>         -numjobs=4 -directory=/mnt/test
> 
> The problem ends up being lock contention on the i_pages spinlock as we
> clear the writeback bit on each folio (and propagate that up through
> the tree).  By using larger folios, we decrease the number of folios
> to be processed by a factor of 256 for this benchmark, eliminating the
> lock contention.
> 
> It's also the right thing to do.  This is a project that has been on
> the back burner for years, it just hasn't been important enough to do
> before now.
> 
> I think it's probably best if this goes through the iomap tree since
> the changes outside iomap are either to the page cache or they're
> trivial.
> 
> v3:
>  - Fix the handling of compound highmem pages in copy_page_from_iter_atomic()
>  - Rename fgp_t to fgf_t
>  - Clarify some wording in the documentation

This v3 patches broken linux 6.4-rc7.

fstests(btrfs/007 and more) will fail with the v3 patches .
but it works well without the v3 patches.

Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/06/21

> v2:
>  - Fix misplaced semicolon
>  - Rename fgp_order to fgp_set_order
>  - Rename FGP_ORDER to FGP_GET_ORDER
>  - Add fgp_t
>  - Update the documentation for ->release_folio
>  - Fix iomap_invalidate_folio()
>  - Update iomap_release_folio()
> 
> Matthew Wilcox (Oracle) (8):
>   iov_iter: Handle compound highmem pages in
>     copy_page_from_iter_atomic()
>   iomap: Remove large folio handling in iomap_invalidate_folio()
>   doc: Correct the description of ->release_folio
>   iomap: Remove unnecessary test from iomap_release_folio()
>   filemap: Add fgf_t typedef
>   filemap: Allow __filemap_get_folio to allocate large folios
>   iomap: Create large folios in the buffered write path
>   iomap: Copy larger chunks from userspace
> 
>  Documentation/filesystems/locking.rst | 15 ++++--
>  fs/btrfs/file.c                       |  6 +--
>  fs/f2fs/compress.c                    |  2 +-
>  fs/f2fs/f2fs.h                        |  2 +-
>  fs/gfs2/bmap.c                        |  2 +-
>  fs/iomap/buffered-io.c                | 43 ++++++++--------
>  include/linux/iomap.h                 |  2 +-
>  include/linux/pagemap.h               | 71 ++++++++++++++++++++++-----
>  lib/iov_iter.c                        | 43 ++++++++++------
>  mm/filemap.c                          | 61 ++++++++++++-----------
>  mm/folio-compat.c                     |  2 +-
>  mm/readahead.c                        | 13 -----
>  12 files changed, 159 insertions(+), 103 deletions(-)
> 
> -- 
> 2.39.2
Matthew Wilcox (Oracle) July 10, 2023, 3:55 a.m. UTC | #2
On Wed, Jun 21, 2023 at 08:03:13PM +0800, Wang Yugui wrote:
> This v3 patches broken linux 6.4-rc7.
> 
> fstests(btrfs/007 and more) will fail with the v3 patches .
> but it works well without the v3 patches.

Sorry, I didn't see this email until just now when I was reviewing all
the comments I got on v3.

I think I found the problem, and it's in patch 1.  Please retest v4
when I send it in a few hours after it's completed a test run.