mbox series

[v9,00/14] io-uring/xfs: support async buffered writes

Message ID 20220616212221.2024518-1-shr@fb.com (mailing list archive)
Headers show
Series io-uring/xfs: support async buffered writes | expand

Message

Stefan Roesch June 16, 2022, 9:22 p.m. UTC
This patch series adds support for async buffered writes when using both
xfs and io-uring. Currently io-uring only supports buffered writes in the
slow path, by processing them in the io workers. With this patch series it is
now possible to support buffered writes in the fast path. To be able to use
the fast path the required pages must be in the page cache, the required locks
in xfs can be granted immediately and no additional blocks need to be read
form disk.

Updating the inode can take time. An optimization has been implemented for
the time update. Time updates will be processed in the slow path. While there
is already a time update in process, other write requests for the same file,
can skip the update of the modification time.
  

Performance results:
  For fio the following results have been obtained with a queue depth of
  1 and 4k block size (runtime 600 secs):

                 sequential writes:
                 without patch           with patch      libaio     psync
  iops:              77k                    209k          195K       233K
  bw:               314MB/s                 854MB/s       790MB/s    953MB/s
  clat:            9600ns                   120ns         540ns     3000ns


For an io depth of 1, the new patch improves throughput by over three times
(compared to the exiting behavior, where buffered writes are processed by an
io-worker process) and also the latency is considerably reduced. To achieve the
same or better performance with the exisiting code an io depth of 4 is required.
Increasing the iodepth further does not lead to improvements.

In addition the latency of buffered write operations is reduced considerably.



Support for async buffered writes:

  To support async buffered writes the flag FMODE_BUF_WASYNC is introduced. In
  addition the check in generic_write_checks is modified to allow for async
  buffered writes that have this flag set.

  Changes to the iomap page create function to allow the caller to specify
  the gfp flags. Sets the IOMAP_NOWAIT flag in iomap if IOCB_NOWAIT has been set
  and specifies the requested gfp flags.

  Adds the iomap async buffered write support to the xfs iomap layer.
  Adds async buffered write support to the xfs iomap layer.

Support for async buffered write support and inode time modification

  Splits the functions for checking if the file privileges need to be removed in
  two functions: check function and a function for the removal of file privileges.
  The same split is also done for the function to update the file modification time.

  Implement an optimization that while a file modification time is pending other
  requests for the same file don't need to wait for the file modification update. 
  This avoids that a considerable number of buffered async write requests get
  punted.

  Take the ilock in nowait mode if async buffered writes are enabled and enable
  the async buffered writes optimization in io_uring.

Support for write throttling of async buffered writes:

  Add a no_wait parameter to the exisiting balance_dirty_pages() function. The
  function will return -EAGAIN if the parameter is true and write throttling is
  required.

  Add a new function called balance_dirty_pages_ratelimited_async() that will be
  invoked from iomap_write_iter() if an async buffered write is requested.
  
Enable async buffered write support in xfs
   This enables async buffered writes for xfs.


Testing:
  This patch has been tested with xfstests, fsx, fio and individual test programs.


Changes:
  V9:
  - Added comment for function balance_dirty_pages_ratelimited_flags()
  - checking return code for iop allocation in iomap_page_create()
  
  V8:
  - Reverted back changes to iomap_write_iter and used Mathew Wilcox code review
    recommendation with an additional change to revert the iterator.
  - Removed patch "fs: Optimization for concurrent file time updates" 
  - Setting flag value in file_modified_flags()
  - Removed additional spaces in comment in file_update_time()
  - Run fsx with 1 billion ops against the changes (Run passed)

  V7:
  - Change definition and if clause in " iomap: Add flags parameter to
    iomap_page_create()"
  - Added patch "iomap: Return error code from iomap_write_iter()" to address
    the problem Dave Chinner brought up: retrying memory allocation a second
    time when we are under memory pressure. 
  - Removed patch "xfs: Change function signature of xfs_ilock_iocb()"
  - Merged patch "xfs: Enable async buffered write support" with previous
    patch

  V6:
  - Pass in iter->flags to calls in iomap_page_create()
  
  V5:
  - Refreshed to 5.19-rc1
  - Merged patch 3 and patch 4
    "mm: Prepare balance_dirty_pages() for async buffered writes" and
    "mm: Add balance_dirty_pages_ratelimited_flags() function"
  - Reformatting long file in iomap_page_create()
  - Replacing gfp parameter with flags parameter in iomap_page_create()
    This makes sure that the gfp setting is done in one location.
  - Moved variable definition outside of loop in iomap_write_iter()
  - Merged patch 7 with patch 6.
  - Introduced __file_remove_privs() that get the iocb_flags passed in
    as an additional parameter
  - Removed file_needs_remove_privs() function
  - Renamed file_needs_update_time() inode_needs_update_time()
  - inode_needs_update_time() no longer passes the file pointer
  - Renamed file_modified_async() to file_modified_flags()
  - Made file_modified_flags() an internal function
  - Removed extern keyword in file_modified_async definition
  - Added kiocb_modified function.
  - Separate patch for changes to xfs_ilock_for_iomap()
  - Separate patch for changes to xfs_ilock_inode()
  - Renamed xfs_ilock_xfs_inode()n back to xfs_ilock_iocb()
  - Renamed flags parameter to iocb_flags in function xfs_ilock_iocb()
  - Used inode_set_flags() to manipulate inode flags in the function
    file_modified_flags()

  V4:
  - Reformat new code in generic_write_checks_count().
  - Removed patch that introduced new function iomap_page_create_gfp().
  - Add gfp parameter to iomap_page_create() and change all callers
    All users will enforce the number of blocks check
  - Removed confusing statement in iomap async buffer support patch
  - Replace no_wait variable in __iomap_write_begin with check of
    IOMAP_NOWAIT for easier readability.
  - Moved else if clause in __iomap_write_begin into else clause for
    easier readability
  - Removed the balance_dirty_pages_ratelimited_async() function and
    reverted back to the earlier version that used the function
    balance_dirty_pages_ratelimited_flags()
  - Introduced the flag BDP_ASYNC.
  - Renamed variable in iomap_write_iter from i_mapping to mapping.
  - Directly call balance_dirty_pages_ratelimited_flags() in the function
    iomap_write_iter().
  - Re-ordered the patches.
  
  V3:
  - Reformat new code in generic_write_checks_count() to line lengthof 80.
  - Remove if condition in __iomap_write_begin to maintain current behavior.
  - use GFP_NOWAIT flag in __iomap_write_begin
  - rename need_file_remove_privs() function to file_needs_remove_privs()
  - rename do_file_remove_privs to __file_remove_privs()
  - add kernel documentation to file_remove_privs() function
  - rework else if branch in file_remove_privs() function
  - add kernel documentation to file_modified() function
  - add kernel documentation to file_modified_async() function
  - rename err variable in file_update_time to ret
  - rename function need_file_update_time() to file_needs_update_time()
  - rename function do_file_update_time() to __file_update_time()
  - don't move check for FMODE_NOCMTIME in generic helper
  - reformat __file_update_time for easier reading
  - add kernel documentation to file_update_time() function
  - fix if in file_update_time from < to <=
  - move modification of inode flags from do_file_update_time to file_modified()
    When this function is called, the caller must hold the inode lock.
  - 3 new patches from Jan to add new no_wait flag to balance_dirty_pages(),
    remove patch 12 from previous series
  - Make balance_dirty_pages_ratelimited_flags() a static function
  - Add new balance_dirty_pages_ratelimited_async() function
  
  V2:
  - Remove atomic allocation
  - Use direct write in xfs_buffered_write_iomap_begin()
  - Use xfs_ilock_for_iomap() in xfs_buffered_write_iomap_begin()
  - Remove no_wait check at the end of xfs_buffered_write_iomap_begin() for
    the COW path.
  - Pass xfs_inode pointer to xfs_ilock_iocb and rename function to
    xfs_lock_xfs_inode
  - Replace existing uses of xfs_ilock_iocb with xfs_ilock_xfs_inode
  - Use xfs_ilock_xfs_inode in xfs_file_buffered_write()
  - Callers of xfs_ilock_for_iomap need to initialize lock mode. This is
    required so writes use an exclusive lock
  - Split of _balance_dirty_pages() from balance_dirty_pages() and return
    sleep time
  - Call _balance_dirty_pages() in balance_dirty_pages_ratelimited_flags()
  - Move call to balance_dirty_pages_ratelimited_flags() in iomap_write_iter()
    to the beginning of the loop



Jan Kara (3):
  mm: Move starting of background writeback into the main balancing loop
  mm: Move updates of dirty_exceeded into one place
  mm: Add balance_dirty_pages_ratelimited_flags() function

Stefan Roesch (11):
  iomap: Add flags parameter to iomap_page_create()
  iomap: Add async buffered write support
  iomap: Return -EAGAIN from iomap_write_iter()
  fs: Add check for async buffered writes to generic_write_checks
  fs: add __remove_file_privs() with flags parameter
  fs: Split off inode_needs_update_time and __file_update_time
  fs: Add async write file modification handling.
  io_uring: Add support for async buffered writes
  io_uring: Add tracepoint for short writes
  xfs: Specify lockmode when calling xfs_ilock_for_iomap()
  xfs: Add async buffered write support

 fs/inode.c                      | 168 +++++++++++++++++++++++---------
 fs/io_uring.c                   |  32 +++++-
 fs/iomap/buffered-io.c          |  71 +++++++++++---
 fs/read_write.c                 |   4 +-
 fs/xfs/xfs_file.c               |  11 +--
 fs/xfs/xfs_iomap.c              |  11 ++-
 include/linux/fs.h              |   4 +
 include/linux/writeback.h       |   7 ++
 include/trace/events/io_uring.h |  25 +++++
 mm/page-writeback.c             |  89 +++++++++++------
 10 files changed, 314 insertions(+), 108 deletions(-)


base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3

Comments

Jens Axboe June 22, 2022, 5:41 p.m. UTC | #1
Top posting - are people fine with queueing this up at this point? Will
need a bit of massaging for io_uring as certain things moved to another
file, but it's really minor. I'd do a separate topic branch for this.


On 6/16/22 3:22 PM, Stefan Roesch wrote:
> This patch series adds support for async buffered writes when using both
> xfs and io-uring. Currently io-uring only supports buffered writes in the
> slow path, by processing them in the io workers. With this patch series it is
> now possible to support buffered writes in the fast path. To be able to use
> the fast path the required pages must be in the page cache, the required locks
> in xfs can be granted immediately and no additional blocks need to be read
> form disk.
> 
> Updating the inode can take time. An optimization has been implemented for
> the time update. Time updates will be processed in the slow path. While there
> is already a time update in process, other write requests for the same file,
> can skip the update of the modification time.
>   
> 
> Performance results:
>   For fio the following results have been obtained with a queue depth of
>   1 and 4k block size (runtime 600 secs):
> 
>                  sequential writes:
>                  without patch           with patch      libaio     psync
>   iops:              77k                    209k          195K       233K
>   bw:               314MB/s                 854MB/s       790MB/s    953MB/s
>   clat:            9600ns                   120ns         540ns     3000ns
> 
> 
> For an io depth of 1, the new patch improves throughput by over three times
> (compared to the exiting behavior, where buffered writes are processed by an
> io-worker process) and also the latency is considerably reduced. To achieve the
> same or better performance with the exisiting code an io depth of 4 is required.
> Increasing the iodepth further does not lead to improvements.
> 
> In addition the latency of buffered write operations is reduced considerably.
> 
> 
> 
> Support for async buffered writes:
> 
>   To support async buffered writes the flag FMODE_BUF_WASYNC is introduced. In
>   addition the check in generic_write_checks is modified to allow for async
>   buffered writes that have this flag set.
> 
>   Changes to the iomap page create function to allow the caller to specify
>   the gfp flags. Sets the IOMAP_NOWAIT flag in iomap if IOCB_NOWAIT has been set
>   and specifies the requested gfp flags.
> 
>   Adds the iomap async buffered write support to the xfs iomap layer.
>   Adds async buffered write support to the xfs iomap layer.
> 
> Support for async buffered write support and inode time modification
> 
>   Splits the functions for checking if the file privileges need to be removed in
>   two functions: check function and a function for the removal of file privileges.
>   The same split is also done for the function to update the file modification time.
> 
>   Implement an optimization that while a file modification time is pending other
>   requests for the same file don't need to wait for the file modification update. 
>   This avoids that a considerable number of buffered async write requests get
>   punted.
> 
>   Take the ilock in nowait mode if async buffered writes are enabled and enable
>   the async buffered writes optimization in io_uring.
> 
> Support for write throttling of async buffered writes:
> 
>   Add a no_wait parameter to the exisiting balance_dirty_pages() function. The
>   function will return -EAGAIN if the parameter is true and write throttling is
>   required.
> 
>   Add a new function called balance_dirty_pages_ratelimited_async() that will be
>   invoked from iomap_write_iter() if an async buffered write is requested.
>   
> Enable async buffered write support in xfs
>    This enables async buffered writes for xfs.
> 
> 
> Testing:
>   This patch has been tested with xfstests, fsx, fio and individual test programs.
> 
> 
> Changes:
>   V9:
>   - Added comment for function balance_dirty_pages_ratelimited_flags()
>   - checking return code for iop allocation in iomap_page_create()
>   
>   V8:
>   - Reverted back changes to iomap_write_iter and used Mathew Wilcox code review
>     recommendation with an additional change to revert the iterator.
>   - Removed patch "fs: Optimization for concurrent file time updates" 
>   - Setting flag value in file_modified_flags()
>   - Removed additional spaces in comment in file_update_time()
>   - Run fsx with 1 billion ops against the changes (Run passed)
> 
>   V7:
>   - Change definition and if clause in " iomap: Add flags parameter to
>     iomap_page_create()"
>   - Added patch "iomap: Return error code from iomap_write_iter()" to address
>     the problem Dave Chinner brought up: retrying memory allocation a second
>     time when we are under memory pressure. 
>   - Removed patch "xfs: Change function signature of xfs_ilock_iocb()"
>   - Merged patch "xfs: Enable async buffered write support" with previous
>     patch
> 
>   V6:
>   - Pass in iter->flags to calls in iomap_page_create()
>   
>   V5:
>   - Refreshed to 5.19-rc1
>   - Merged patch 3 and patch 4
>     "mm: Prepare balance_dirty_pages() for async buffered writes" and
>     "mm: Add balance_dirty_pages_ratelimited_flags() function"
>   - Reformatting long file in iomap_page_create()
>   - Replacing gfp parameter with flags parameter in iomap_page_create()
>     This makes sure that the gfp setting is done in one location.
>   - Moved variable definition outside of loop in iomap_write_iter()
>   - Merged patch 7 with patch 6.
>   - Introduced __file_remove_privs() that get the iocb_flags passed in
>     as an additional parameter
>   - Removed file_needs_remove_privs() function
>   - Renamed file_needs_update_time() inode_needs_update_time()
>   - inode_needs_update_time() no longer passes the file pointer
>   - Renamed file_modified_async() to file_modified_flags()
>   - Made file_modified_flags() an internal function
>   - Removed extern keyword in file_modified_async definition
>   - Added kiocb_modified function.
>   - Separate patch for changes to xfs_ilock_for_iomap()
>   - Separate patch for changes to xfs_ilock_inode()
>   - Renamed xfs_ilock_xfs_inode()n back to xfs_ilock_iocb()
>   - Renamed flags parameter to iocb_flags in function xfs_ilock_iocb()
>   - Used inode_set_flags() to manipulate inode flags in the function
>     file_modified_flags()
> 
>   V4:
>   - Reformat new code in generic_write_checks_count().
>   - Removed patch that introduced new function iomap_page_create_gfp().
>   - Add gfp parameter to iomap_page_create() and change all callers
>     All users will enforce the number of blocks check
>   - Removed confusing statement in iomap async buffer support patch
>   - Replace no_wait variable in __iomap_write_begin with check of
>     IOMAP_NOWAIT for easier readability.
>   - Moved else if clause in __iomap_write_begin into else clause for
>     easier readability
>   - Removed the balance_dirty_pages_ratelimited_async() function and
>     reverted back to the earlier version that used the function
>     balance_dirty_pages_ratelimited_flags()
>   - Introduced the flag BDP_ASYNC.
>   - Renamed variable in iomap_write_iter from i_mapping to mapping.
>   - Directly call balance_dirty_pages_ratelimited_flags() in the function
>     iomap_write_iter().
>   - Re-ordered the patches.
>   
>   V3:
>   - Reformat new code in generic_write_checks_count() to line lengthof 80.
>   - Remove if condition in __iomap_write_begin to maintain current behavior.
>   - use GFP_NOWAIT flag in __iomap_write_begin
>   - rename need_file_remove_privs() function to file_needs_remove_privs()
>   - rename do_file_remove_privs to __file_remove_privs()
>   - add kernel documentation to file_remove_privs() function
>   - rework else if branch in file_remove_privs() function
>   - add kernel documentation to file_modified() function
>   - add kernel documentation to file_modified_async() function
>   - rename err variable in file_update_time to ret
>   - rename function need_file_update_time() to file_needs_update_time()
>   - rename function do_file_update_time() to __file_update_time()
>   - don't move check for FMODE_NOCMTIME in generic helper
>   - reformat __file_update_time for easier reading
>   - add kernel documentation to file_update_time() function
>   - fix if in file_update_time from < to <=
>   - move modification of inode flags from do_file_update_time to file_modified()
>     When this function is called, the caller must hold the inode lock.
>   - 3 new patches from Jan to add new no_wait flag to balance_dirty_pages(),
>     remove patch 12 from previous series
>   - Make balance_dirty_pages_ratelimited_flags() a static function
>   - Add new balance_dirty_pages_ratelimited_async() function
>   
>   V2:
>   - Remove atomic allocation
>   - Use direct write in xfs_buffered_write_iomap_begin()
>   - Use xfs_ilock_for_iomap() in xfs_buffered_write_iomap_begin()
>   - Remove no_wait check at the end of xfs_buffered_write_iomap_begin() for
>     the COW path.
>   - Pass xfs_inode pointer to xfs_ilock_iocb and rename function to
>     xfs_lock_xfs_inode
>   - Replace existing uses of xfs_ilock_iocb with xfs_ilock_xfs_inode
>   - Use xfs_ilock_xfs_inode in xfs_file_buffered_write()
>   - Callers of xfs_ilock_for_iomap need to initialize lock mode. This is
>     required so writes use an exclusive lock
>   - Split of _balance_dirty_pages() from balance_dirty_pages() and return
>     sleep time
>   - Call _balance_dirty_pages() in balance_dirty_pages_ratelimited_flags()
>   - Move call to balance_dirty_pages_ratelimited_flags() in iomap_write_iter()
>     to the beginning of the loop
> 
> 
> 
> Jan Kara (3):
>   mm: Move starting of background writeback into the main balancing loop
>   mm: Move updates of dirty_exceeded into one place
>   mm: Add balance_dirty_pages_ratelimited_flags() function
> 
> Stefan Roesch (11):
>   iomap: Add flags parameter to iomap_page_create()
>   iomap: Add async buffered write support
>   iomap: Return -EAGAIN from iomap_write_iter()
>   fs: Add check for async buffered writes to generic_write_checks
>   fs: add __remove_file_privs() with flags parameter
>   fs: Split off inode_needs_update_time and __file_update_time
>   fs: Add async write file modification handling.
>   io_uring: Add support for async buffered writes
>   io_uring: Add tracepoint for short writes
>   xfs: Specify lockmode when calling xfs_ilock_for_iomap()
>   xfs: Add async buffered write support
> 
>  fs/inode.c                      | 168 +++++++++++++++++++++++---------
>  fs/io_uring.c                   |  32 +++++-
>  fs/iomap/buffered-io.c          |  71 +++++++++++---
>  fs/read_write.c                 |   4 +-
>  fs/xfs/xfs_file.c               |  11 +--
>  fs/xfs/xfs_iomap.c              |  11 ++-
>  include/linux/fs.h              |   4 +
>  include/linux/writeback.h       |   7 ++
>  include/trace/events/io_uring.h |  25 +++++
>  mm/page-writeback.c             |  89 +++++++++++------
>  10 files changed, 314 insertions(+), 108 deletions(-)
> 
> 
> base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
Matthew Wilcox (Oracle) June 22, 2022, 7:35 p.m. UTC | #2
On Wed, Jun 22, 2022 at 11:41:14AM -0600, Jens Axboe wrote:
> Top posting - are people fine with queueing this up at this point? Will
> need a bit of massaging for io_uring as certain things moved to another
> file, but it's really minor. I'd do a separate topic branch for this.

I haven't had time to review this version, and I'm not likely to have
time before July 4th.
Jens Axboe June 22, 2022, 7:37 p.m. UTC | #3
On 6/22/22 1:35 PM, Matthew Wilcox wrote:
> On Wed, Jun 22, 2022 at 11:41:14AM -0600, Jens Axboe wrote:
>> Top posting - are people fine with queueing this up at this point? Will
>> need a bit of massaging for io_uring as certain things moved to another
>> file, but it's really minor. I'd do a separate topic branch for this.
> 
> I haven't had time to review this version, and I'm not likely to have
> time before July 4th.

I think Stefan addressed your previous concerns. But it's not like the
merge window is around the corner, but would be nice to get some -next
coverage in the mean time. So I don't think you being away should hold
up that part, and there will still be time to take a look once you're
back.
Jan Kara June 22, 2022, 9:24 p.m. UTC | #4
On Wed 22-06-22 11:41:14, Jens Axboe wrote:
> Top posting - are people fine with queueing this up at this point? Will
> need a bit of massaging for io_uring as certain things moved to another
> file, but it's really minor. I'd do a separate topic branch for this.

I have no objections to merging this. The parts I felt confident about
enough look OK to me (and have my reviewed-by tag).

								Honza

> On 6/16/22 3:22 PM, Stefan Roesch wrote:
> > This patch series adds support for async buffered writes when using both
> > xfs and io-uring. Currently io-uring only supports buffered writes in the
> > slow path, by processing them in the io workers. With this patch series it is
> > now possible to support buffered writes in the fast path. To be able to use
> > the fast path the required pages must be in the page cache, the required locks
> > in xfs can be granted immediately and no additional blocks need to be read
> > form disk.
> > 
> > Updating the inode can take time. An optimization has been implemented for
> > the time update. Time updates will be processed in the slow path. While there
> > is already a time update in process, other write requests for the same file,
> > can skip the update of the modification time.
> >   
> > 
> > Performance results:
> >   For fio the following results have been obtained with a queue depth of
> >   1 and 4k block size (runtime 600 secs):
> > 
> >                  sequential writes:
> >                  without patch           with patch      libaio     psync
> >   iops:              77k                    209k          195K       233K
> >   bw:               314MB/s                 854MB/s       790MB/s    953MB/s
> >   clat:            9600ns                   120ns         540ns     3000ns
> > 
> > 
> > For an io depth of 1, the new patch improves throughput by over three times
> > (compared to the exiting behavior, where buffered writes are processed by an
> > io-worker process) and also the latency is considerably reduced. To achieve the
> > same or better performance with the exisiting code an io depth of 4 is required.
> > Increasing the iodepth further does not lead to improvements.
> > 
> > In addition the latency of buffered write operations is reduced considerably.
> > 
> > 
> > 
> > Support for async buffered writes:
> > 
> >   To support async buffered writes the flag FMODE_BUF_WASYNC is introduced. In
> >   addition the check in generic_write_checks is modified to allow for async
> >   buffered writes that have this flag set.
> > 
> >   Changes to the iomap page create function to allow the caller to specify
> >   the gfp flags. Sets the IOMAP_NOWAIT flag in iomap if IOCB_NOWAIT has been set
> >   and specifies the requested gfp flags.
> > 
> >   Adds the iomap async buffered write support to the xfs iomap layer.
> >   Adds async buffered write support to the xfs iomap layer.
> > 
> > Support for async buffered write support and inode time modification
> > 
> >   Splits the functions for checking if the file privileges need to be removed in
> >   two functions: check function and a function for the removal of file privileges.
> >   The same split is also done for the function to update the file modification time.
> > 
> >   Implement an optimization that while a file modification time is pending other
> >   requests for the same file don't need to wait for the file modification update. 
> >   This avoids that a considerable number of buffered async write requests get
> >   punted.
> > 
> >   Take the ilock in nowait mode if async buffered writes are enabled and enable
> >   the async buffered writes optimization in io_uring.
> > 
> > Support for write throttling of async buffered writes:
> > 
> >   Add a no_wait parameter to the exisiting balance_dirty_pages() function. The
> >   function will return -EAGAIN if the parameter is true and write throttling is
> >   required.
> > 
> >   Add a new function called balance_dirty_pages_ratelimited_async() that will be
> >   invoked from iomap_write_iter() if an async buffered write is requested.
> >   
> > Enable async buffered write support in xfs
> >    This enables async buffered writes for xfs.
> > 
> > 
> > Testing:
> >   This patch has been tested with xfstests, fsx, fio and individual test programs.
> > 
> > 
> > Changes:
> >   V9:
> >   - Added comment for function balance_dirty_pages_ratelimited_flags()
> >   - checking return code for iop allocation in iomap_page_create()
> >   
> >   V8:
> >   - Reverted back changes to iomap_write_iter and used Mathew Wilcox code review
> >     recommendation with an additional change to revert the iterator.
> >   - Removed patch "fs: Optimization for concurrent file time updates" 
> >   - Setting flag value in file_modified_flags()
> >   - Removed additional spaces in comment in file_update_time()
> >   - Run fsx with 1 billion ops against the changes (Run passed)
> > 
> >   V7:
> >   - Change definition and if clause in " iomap: Add flags parameter to
> >     iomap_page_create()"
> >   - Added patch "iomap: Return error code from iomap_write_iter()" to address
> >     the problem Dave Chinner brought up: retrying memory allocation a second
> >     time when we are under memory pressure. 
> >   - Removed patch "xfs: Change function signature of xfs_ilock_iocb()"
> >   - Merged patch "xfs: Enable async buffered write support" with previous
> >     patch
> > 
> >   V6:
> >   - Pass in iter->flags to calls in iomap_page_create()
> >   
> >   V5:
> >   - Refreshed to 5.19-rc1
> >   - Merged patch 3 and patch 4
> >     "mm: Prepare balance_dirty_pages() for async buffered writes" and
> >     "mm: Add balance_dirty_pages_ratelimited_flags() function"
> >   - Reformatting long file in iomap_page_create()
> >   - Replacing gfp parameter with flags parameter in iomap_page_create()
> >     This makes sure that the gfp setting is done in one location.
> >   - Moved variable definition outside of loop in iomap_write_iter()
> >   - Merged patch 7 with patch 6.
> >   - Introduced __file_remove_privs() that get the iocb_flags passed in
> >     as an additional parameter
> >   - Removed file_needs_remove_privs() function
> >   - Renamed file_needs_update_time() inode_needs_update_time()
> >   - inode_needs_update_time() no longer passes the file pointer
> >   - Renamed file_modified_async() to file_modified_flags()
> >   - Made file_modified_flags() an internal function
> >   - Removed extern keyword in file_modified_async definition
> >   - Added kiocb_modified function.
> >   - Separate patch for changes to xfs_ilock_for_iomap()
> >   - Separate patch for changes to xfs_ilock_inode()
> >   - Renamed xfs_ilock_xfs_inode()n back to xfs_ilock_iocb()
> >   - Renamed flags parameter to iocb_flags in function xfs_ilock_iocb()
> >   - Used inode_set_flags() to manipulate inode flags in the function
> >     file_modified_flags()
> > 
> >   V4:
> >   - Reformat new code in generic_write_checks_count().
> >   - Removed patch that introduced new function iomap_page_create_gfp().
> >   - Add gfp parameter to iomap_page_create() and change all callers
> >     All users will enforce the number of blocks check
> >   - Removed confusing statement in iomap async buffer support patch
> >   - Replace no_wait variable in __iomap_write_begin with check of
> >     IOMAP_NOWAIT for easier readability.
> >   - Moved else if clause in __iomap_write_begin into else clause for
> >     easier readability
> >   - Removed the balance_dirty_pages_ratelimited_async() function and
> >     reverted back to the earlier version that used the function
> >     balance_dirty_pages_ratelimited_flags()
> >   - Introduced the flag BDP_ASYNC.
> >   - Renamed variable in iomap_write_iter from i_mapping to mapping.
> >   - Directly call balance_dirty_pages_ratelimited_flags() in the function
> >     iomap_write_iter().
> >   - Re-ordered the patches.
> >   
> >   V3:
> >   - Reformat new code in generic_write_checks_count() to line lengthof 80.
> >   - Remove if condition in __iomap_write_begin to maintain current behavior.
> >   - use GFP_NOWAIT flag in __iomap_write_begin
> >   - rename need_file_remove_privs() function to file_needs_remove_privs()
> >   - rename do_file_remove_privs to __file_remove_privs()
> >   - add kernel documentation to file_remove_privs() function
> >   - rework else if branch in file_remove_privs() function
> >   - add kernel documentation to file_modified() function
> >   - add kernel documentation to file_modified_async() function
> >   - rename err variable in file_update_time to ret
> >   - rename function need_file_update_time() to file_needs_update_time()
> >   - rename function do_file_update_time() to __file_update_time()
> >   - don't move check for FMODE_NOCMTIME in generic helper
> >   - reformat __file_update_time for easier reading
> >   - add kernel documentation to file_update_time() function
> >   - fix if in file_update_time from < to <=
> >   - move modification of inode flags from do_file_update_time to file_modified()
> >     When this function is called, the caller must hold the inode lock.
> >   - 3 new patches from Jan to add new no_wait flag to balance_dirty_pages(),
> >     remove patch 12 from previous series
> >   - Make balance_dirty_pages_ratelimited_flags() a static function
> >   - Add new balance_dirty_pages_ratelimited_async() function
> >   
> >   V2:
> >   - Remove atomic allocation
> >   - Use direct write in xfs_buffered_write_iomap_begin()
> >   - Use xfs_ilock_for_iomap() in xfs_buffered_write_iomap_begin()
> >   - Remove no_wait check at the end of xfs_buffered_write_iomap_begin() for
> >     the COW path.
> >   - Pass xfs_inode pointer to xfs_ilock_iocb and rename function to
> >     xfs_lock_xfs_inode
> >   - Replace existing uses of xfs_ilock_iocb with xfs_ilock_xfs_inode
> >   - Use xfs_ilock_xfs_inode in xfs_file_buffered_write()
> >   - Callers of xfs_ilock_for_iomap need to initialize lock mode. This is
> >     required so writes use an exclusive lock
> >   - Split of _balance_dirty_pages() from balance_dirty_pages() and return
> >     sleep time
> >   - Call _balance_dirty_pages() in balance_dirty_pages_ratelimited_flags()
> >   - Move call to balance_dirty_pages_ratelimited_flags() in iomap_write_iter()
> >     to the beginning of the loop
> > 
> > 
> > 
> > Jan Kara (3):
> >   mm: Move starting of background writeback into the main balancing loop
> >   mm: Move updates of dirty_exceeded into one place
> >   mm: Add balance_dirty_pages_ratelimited_flags() function
> > 
> > Stefan Roesch (11):
> >   iomap: Add flags parameter to iomap_page_create()
> >   iomap: Add async buffered write support
> >   iomap: Return -EAGAIN from iomap_write_iter()
> >   fs: Add check for async buffered writes to generic_write_checks
> >   fs: add __remove_file_privs() with flags parameter
> >   fs: Split off inode_needs_update_time and __file_update_time
> >   fs: Add async write file modification handling.
> >   io_uring: Add support for async buffered writes
> >   io_uring: Add tracepoint for short writes
> >   xfs: Specify lockmode when calling xfs_ilock_for_iomap()
> >   xfs: Add async buffered write support
> > 
> >  fs/inode.c                      | 168 +++++++++++++++++++++++---------
> >  fs/io_uring.c                   |  32 +++++-
> >  fs/iomap/buffered-io.c          |  71 +++++++++++---
> >  fs/read_write.c                 |   4 +-
> >  fs/xfs/xfs_file.c               |  11 +--
> >  fs/xfs/xfs_iomap.c              |  11 ++-
> >  include/linux/fs.h              |   4 +
> >  include/linux/writeback.h       |   7 ++
> >  include/trace/events/io_uring.h |  25 +++++
> >  mm/page-writeback.c             |  89 +++++++++++------
> >  10 files changed, 314 insertions(+), 108 deletions(-)
> > 
> > 
> > base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3
> 
> 
> -- 
> Jens Axboe
>
Jens Axboe June 22, 2022, 10:27 p.m. UTC | #5
On Thu, 16 Jun 2022 14:22:07 -0700, Stefan Roesch wrote:
> This patch series adds support for async buffered writes when using both
> xfs and io-uring. Currently io-uring only supports buffered writes in the
> slow path, by processing them in the io workers. With this patch series it is
> now possible to support buffered writes in the fast path. To be able to use
> the fast path the required pages must be in the page cache, the required locks
> in xfs can be granted immediately and no additional blocks need to be read
> form disk.
> 
> [...]

Applied, thanks!

[01/14] mm: Move starting of background writeback into the main balancing loop
        commit: 29c36351d61fd08a2ed50a8028a7f752401dc88a
[02/14] mm: Move updates of dirty_exceeded into one place
        commit: a3fa4409eec3c094ad632ac1029094e061daf152
[03/14] mm: Add balance_dirty_pages_ratelimited_flags() function
        commit: 407619d2cef3b4d74565999a255a17cf5d559fa4
[04/14] iomap: Add flags parameter to iomap_page_create()
        commit: 49b5cd0830c1e9aa0f9a3717ac11a74ef23b9d4e
[05/14] iomap: Add async buffered write support
        commit: ccb885b4392143cea1bdbd8a0f35f0e6d909b114
[06/14] iomap: Return -EAGAIN from iomap_write_iter()
        commit: f0f9828d64393ea2ce87bd97f033051c8d7a337f
[07/14] fs: Add check for async buffered writes to generic_write_checks
        commit: cba06e23bc664ef419d389f1ed4cee523f468f8f
[08/14] fs: add __remove_file_privs() with flags parameter
        commit: 79d8ac83d6305fd8e996f720f955191e0d8c63b9
[09/14] fs: Split off inode_needs_update_time and __file_update_time
        commit: 1899b196859bac61ad71c3b3916e06de4b65246c
[10/14] fs: Add async write file modification handling.
        commit: 4705f225a56f216a59e09f7c2df16daabb7b4f76
[11/14] io_uring: Add support for async buffered writes
        commit: 6c8bbd82a43a0c7937e3e8e38cf46fcd90e15e68
[12/14] io_uring: Add tracepoint for short writes
        commit: 6c33dae4526ad079af6432aaf76827d0a27a9690
[13/14] xfs: Specify lockmode when calling xfs_ilock_for_iomap()
        commit: ddda2d473df70607bb456c515d984d05bf689790
[14/14] xfs: Add async buffered write support
        commit: e9cfc64a27f7a581b8c5d14da4efccfeae9c63bd

Best regards,
Darrick J. Wong June 23, 2022, 12:29 a.m. UTC | #6
On Wed, Jun 22, 2022 at 04:27:07PM -0600, Jens Axboe wrote:
> On Thu, 16 Jun 2022 14:22:07 -0700, Stefan Roesch wrote:
> > This patch series adds support for async buffered writes when using both
> > xfs and io-uring. Currently io-uring only supports buffered writes in the
> > slow path, by processing them in the io workers. With this patch series it is
> > now possible to support buffered writes in the fast path. To be able to use
> > the fast path the required pages must be in the page cache, the required locks
> > in xfs can be granted immediately and no additional blocks need to be read
> > form disk.
> > 
> > [...]
> 
> Applied, thanks!
> 
> [01/14] mm: Move starting of background writeback into the main balancing loop
>         commit: 29c36351d61fd08a2ed50a8028a7f752401dc88a
> [02/14] mm: Move updates of dirty_exceeded into one place
>         commit: a3fa4409eec3c094ad632ac1029094e061daf152
> [03/14] mm: Add balance_dirty_pages_ratelimited_flags() function
>         commit: 407619d2cef3b4d74565999a255a17cf5d559fa4
> [04/14] iomap: Add flags parameter to iomap_page_create()
>         commit: 49b5cd0830c1e9aa0f9a3717ac11a74ef23b9d4e
> [05/14] iomap: Add async buffered write support
>         commit: ccb885b4392143cea1bdbd8a0f35f0e6d909b114
> [06/14] iomap: Return -EAGAIN from iomap_write_iter()
>         commit: f0f9828d64393ea2ce87bd97f033051c8d7a337f

I'm not sure /what/ happened here, but I never received the full V9
series, and neither did lore:

https://lore.kernel.org/linux-fsdevel/165593682792.161026.12974983413174964699.b4-ty@kernel.dk/T/#t

As it is, I already have my hands full trying to figure out why
generic/522 reports file corruption after 20 minutes of running on
vanilla 5.19-rc3, so I don't think I'm going to get to this for a while
either.

The v8 series looked all right to me, but ********* I hate how our
development process relies on such unreliable **** tooling.  I don't
think it's a /great/ idea to be pushing new code into -next when both
the xfs and pagecache maintainers are too busy to read the whole thing
through... but did hch actually RVB the whole thing prior to v9?

--D

> [07/14] fs: Add check for async buffered writes to generic_write_checks
>         commit: cba06e23bc664ef419d389f1ed4cee523f468f8f
> [08/14] fs: add __remove_file_privs() with flags parameter
>         commit: 79d8ac83d6305fd8e996f720f955191e0d8c63b9
> [09/14] fs: Split off inode_needs_update_time and __file_update_time
>         commit: 1899b196859bac61ad71c3b3916e06de4b65246c
> [10/14] fs: Add async write file modification handling.
>         commit: 4705f225a56f216a59e09f7c2df16daabb7b4f76
> [11/14] io_uring: Add support for async buffered writes
>         commit: 6c8bbd82a43a0c7937e3e8e38cf46fcd90e15e68
> [12/14] io_uring: Add tracepoint for short writes
>         commit: 6c33dae4526ad079af6432aaf76827d0a27a9690
> [13/14] xfs: Specify lockmode when calling xfs_ilock_for_iomap()
>         commit: ddda2d473df70607bb456c515d984d05bf689790
> [14/14] xfs: Add async buffered write support
>         commit: e9cfc64a27f7a581b8c5d14da4efccfeae9c63bd
> 
> Best regards,
> -- 
> Jens Axboe
> 
>
Jens Axboe June 23, 2022, 12:50 a.m. UTC | #7
On 6/22/22 6:29 PM, Darrick J. Wong wrote:
> On Wed, Jun 22, 2022 at 04:27:07PM -0600, Jens Axboe wrote:
>> On Thu, 16 Jun 2022 14:22:07 -0700, Stefan Roesch wrote:
>>> This patch series adds support for async buffered writes when using both
>>> xfs and io-uring. Currently io-uring only supports buffered writes in the
>>> slow path, by processing them in the io workers. With this patch series it is
>>> now possible to support buffered writes in the fast path. To be able to use
>>> the fast path the required pages must be in the page cache, the required locks
>>> in xfs can be granted immediately and no additional blocks need to be read
>>> form disk.
>>>
>>> [...]
>>
>> Applied, thanks!
>>
>> [01/14] mm: Move starting of background writeback into the main balancing loop
>>         commit: 29c36351d61fd08a2ed50a8028a7f752401dc88a
>> [02/14] mm: Move updates of dirty_exceeded into one place
>>         commit: a3fa4409eec3c094ad632ac1029094e061daf152
>> [03/14] mm: Add balance_dirty_pages_ratelimited_flags() function
>>         commit: 407619d2cef3b4d74565999a255a17cf5d559fa4
>> [04/14] iomap: Add flags parameter to iomap_page_create()
>>         commit: 49b5cd0830c1e9aa0f9a3717ac11a74ef23b9d4e
>> [05/14] iomap: Add async buffered write support
>>         commit: ccb885b4392143cea1bdbd8a0f35f0e6d909b114
>> [06/14] iomap: Return -EAGAIN from iomap_write_iter()
>>         commit: f0f9828d64393ea2ce87bd97f033051c8d7a337f
> 
> I'm not sure /what/ happened here, but I never received the full V9
> series, and neither did lore:
> 
> https://lore.kernel.org/linux-fsdevel/165593682792.161026.12974983413174964699.b4-ty@kernel.dk/T/#t

Huh yes, didn't even notice that it's missing a few.

> As it is, I already have my hands full trying to figure out why
> generic/522 reports file corruption after 20 minutes of running on
> vanilla 5.19-rc3, so I don't think I'm going to get to this for a while
> either.
> 
> The v8 series looked all right to me, but ********* I hate how our
> development process relies on such unreliable **** tooling.  I don't

Me too, and the fact that email is getting worse and worse is not making
things any better...

> think it's a /great/ idea to be pushing new code into -next when both
> the xfs and pagecache maintainers are too busy to read the whole thing
> through... but did hch actually RVB the whole thing prior to v9?

Yes, hch did review the whole thing prior to v9. v9 has been pretty
quiet, but even v8 didn't have a whole lot. Which is to be expected for
a v9, this thing has been going for months.

We're only at -rc3 right now, so I think it's fine getting it some -next
exposure. It's not like it's getting pushed tomorrow, and if actual
concerns arise, let's just deal with them if that's the case. I'll check
in with folks before anything gets pushed certainly, I just don't think
it's fair to keep stalling when there are no real objections. Nothing
gets pushed unless the vested parties agree, obviously.
Darrick J. Wong June 23, 2022, 6:29 a.m. UTC | #8
On Wed, Jun 22, 2022 at 06:50:29PM -0600, Jens Axboe wrote:
> On 6/22/22 6:29 PM, Darrick J. Wong wrote:
> > On Wed, Jun 22, 2022 at 04:27:07PM -0600, Jens Axboe wrote:
> >> On Thu, 16 Jun 2022 14:22:07 -0700, Stefan Roesch wrote:
> >>> This patch series adds support for async buffered writes when using both
> >>> xfs and io-uring. Currently io-uring only supports buffered writes in the
> >>> slow path, by processing them in the io workers. With this patch series it is
> >>> now possible to support buffered writes in the fast path. To be able to use
> >>> the fast path the required pages must be in the page cache, the required locks
> >>> in xfs can be granted immediately and no additional blocks need to be read
> >>> form disk.
> >>>
> >>> [...]
> >>
> >> Applied, thanks!
> >>
> >> [01/14] mm: Move starting of background writeback into the main balancing loop
> >>         commit: 29c36351d61fd08a2ed50a8028a7f752401dc88a
> >> [02/14] mm: Move updates of dirty_exceeded into one place
> >>         commit: a3fa4409eec3c094ad632ac1029094e061daf152
> >> [03/14] mm: Add balance_dirty_pages_ratelimited_flags() function
> >>         commit: 407619d2cef3b4d74565999a255a17cf5d559fa4
> >> [04/14] iomap: Add flags parameter to iomap_page_create()
> >>         commit: 49b5cd0830c1e9aa0f9a3717ac11a74ef23b9d4e
> >> [05/14] iomap: Add async buffered write support
> >>         commit: ccb885b4392143cea1bdbd8a0f35f0e6d909b114
> >> [06/14] iomap: Return -EAGAIN from iomap_write_iter()
> >>         commit: f0f9828d64393ea2ce87bd97f033051c8d7a337f
> > 
> > I'm not sure /what/ happened here, but I never received the full V9
> > series, and neither did lore:
> > 
> > https://lore.kernel.org/linux-fsdevel/165593682792.161026.12974983413174964699.b4-ty@kernel.dk/T/#t
> 
> Huh yes, didn't even notice that it's missing a few.
> 
> > As it is, I already have my hands full trying to figure out why
> > generic/522 reports file corruption after 20 minutes of running on
> > vanilla 5.19-rc3, so I don't think I'm going to get to this for a while
> > either.
> > 
> > The v8 series looked all right to me, but ********* I hate how our
> > development process relies on such unreliable **** tooling.  I don't
> 
> Me too, and the fact that email is getting worse and worse is not making
> things any better...
> 
> > think it's a /great/ idea to be pushing new code into -next when both
> > the xfs and pagecache maintainers are too busy to read the whole thing
> > through... but did hch actually RVB the whole thing prior to v9?
> 
> Yes, hch did review the whole thing prior to v9. v9 has been pretty
> quiet, but even v8 didn't have a whole lot. Which is to be expected for
> a v9, this thing has been going for months.

<nod>

> We're only at -rc3 right now, so I think it's fine getting it some -next
> exposure. It's not like it's getting pushed tomorrow, and if actual
> concerns arise, let's just deal with them if that's the case. I'll check
> in with folks before anything gets pushed certainly, I just don't think
> it's fair to keep stalling when there are no real objections. Nothing
> gets pushed unless the vested parties agree, obviously.

Ok.  Would you or Stefan mind sending the whole v9 series again, so I
can have one more look?  Hopefully vger won't just eat the series a
third time... :(

Huh.  Ok.  LWN seems to have gotten the whole thing:
https://lwn.net/ml/linux-mm/20220616212221.2024518-1-shr@fb.com/

I'll go read that in the meantime.  $DEITY I hate email.

--D

> -- 
> Jens Axboe
>