From patchwork Tue Jan 12 01:07:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12011999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA119C43332 for ; Tue, 12 Jan 2021 01:08:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D417B22D58 for ; Tue, 12 Jan 2021 01:08:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731524AbhALBIr (ORCPT ); Mon, 11 Jan 2021 20:08:47 -0500 Received: from mail107.syd.optusnet.com.au ([211.29.132.53]:45908 "EHLO mail107.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731358AbhALBIq (ORCPT ); Mon, 11 Jan 2021 20:08:46 -0500 Received: from dread.disaster.area (pa49-179-167-107.pa.nsw.optusnet.com.au [49.179.167.107]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id 0F0DF67F5; Tue, 12 Jan 2021 12:08:00 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1kz8A1-005Wb9-Fw; Tue, 12 Jan 2021 12:07:49 +1100 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1kz8A1-004qaw-7o; Tue, 12 Jan 2021 12:07:49 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, avi@scylladb.com, andres@anarazel.de Subject: [PATCH 1/6] iomap: convert iomap_dio_rw() to an args structure Date: Tue, 12 Jan 2021 12:07:41 +1100 Message-Id: <20210112010746.1154363-2-david@fromorbit.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210112010746.1154363-1-david@fromorbit.com> References: <20210112010746.1154363-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Ubgvt5aN c=1 sm=1 tr=0 cx=a_idp_d a=+wqVUQIkAh0lLYI+QRsciw==:117 a=+wqVUQIkAh0lLYI+QRsciw==:17 a=EmqxpYm9HcoA:10 a=20KFwNOVAAAA:8 a=3h4dG6124SXyyoldOQQA:9 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Dave Chinner Adding yet another parameter to the iomap_dio_rw() interface means changing lots of filesystems to add the parameter. Convert this interface to an args structure so in future we don't need to modify every caller to add a new parameter. Signed-off-by: Dave Chinner Acked-by: Damien Le Moal --- fs/btrfs/file.c | 21 ++++++++++++++++----- fs/ext4/file.c | 24 ++++++++++++++++++------ fs/gfs2/file.c | 19 ++++++++++++++----- fs/iomap/direct-io.c | 30 ++++++++++++++---------------- fs/xfs/xfs_file.c | 30 +++++++++++++++++++++--------- fs/zonefs/super.c | 21 +++++++++++++++++---- include/linux/iomap.h | 16 ++++++++++------ 7 files changed, 110 insertions(+), 51 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 0e41459b8de6..a49d9fa918d1 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1907,6 +1907,13 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) ssize_t err; unsigned int ilock_flags = 0; struct iomap_dio *dio = NULL; + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = from, + .ops = &btrfs_dio_iomap_ops, + .dops = &btrfs_dio_ops, + .wait_for_completion = is_sync_kiocb(iocb), + }; if (iocb->ki_flags & IOCB_NOWAIT) ilock_flags |= BTRFS_ILOCK_TRY; @@ -1949,9 +1956,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) goto buffered; } - dio = __iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, - &btrfs_dio_ops, is_sync_kiocb(iocb)); - + dio = __iomap_dio_rw(&args); btrfs_inode_unlock(inode, ilock_flags); if (IS_ERR_OR_NULL(dio)) { @@ -3617,13 +3622,19 @@ static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = to, + .ops = &btrfs_dio_iomap_ops, + .dops = &btrfs_dio_ops, + .wait_for_completion = is_sync_kiocb(iocb), + }; if (check_direct_read(btrfs_sb(inode->i_sb), to, iocb->ki_pos)) return 0; btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED); - ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dio_ops, - is_sync_kiocb(iocb)); + ret = iomap_dio_rw(&args); btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); return ret; } diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 3ed8c048fb12..436508be6d88 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -53,6 +53,12 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) { ssize_t ret; struct inode *inode = file_inode(iocb->ki_filp); + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = to, + .ops = &ext4_iomap_ops, + .wait_for_completion = is_sync_kiocb(iocb), + }; if (iocb->ki_flags & IOCB_NOWAIT) { if (!inode_trylock_shared(inode)) @@ -74,8 +80,7 @@ static ssize_t ext4_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) return generic_file_read_iter(iocb, to); } - ret = iomap_dio_rw(iocb, to, &ext4_iomap_ops, NULL, - is_sync_kiocb(iocb)); + ret = iomap_dio_rw(&args); inode_unlock_shared(inode); file_accessed(iocb->ki_filp); @@ -459,9 +464,15 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) struct inode *inode = file_inode(iocb->ki_filp); loff_t offset = iocb->ki_pos; size_t count = iov_iter_count(from); - const struct iomap_ops *iomap_ops = &ext4_iomap_ops; bool extend = false, unaligned_io = false; bool ilock_shared = true; + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = from, + .ops = &ext4_iomap_ops, + .dops = &ext4_dio_write_ops, + .wait_for_completion = is_sync_kiocb(iocb), + }; /* * We initially start with shared inode lock unless it is @@ -548,9 +559,10 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) } if (ilock_shared) - iomap_ops = &ext4_iomap_overwrite_ops; - ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, - is_sync_kiocb(iocb) || unaligned_io || extend); + args.ops = &ext4_iomap_overwrite_ops; + if (unaligned_io || extend) + args.wait_for_completion = true; + ret = iomap_dio_rw(&args); if (ret == -ENOTBLK) ret = 0; diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index b39b339feddc..d44a5f9c5f34 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -788,6 +788,12 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to, struct gfs2_inode *ip = GFS2_I(file->f_mapping->host); size_t count = iov_iter_count(to); ssize_t ret; + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = to, + .ops = &gfs2_iomap_ops, + .wait_for_completion = is_sync_kiocb(iocb), + }; if (!count) return 0; /* skip atime */ @@ -797,9 +803,7 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to, if (ret) goto out_uninit; - ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, - is_sync_kiocb(iocb)); - + ret = iomap_dio_rw(&args); gfs2_glock_dq(gh); out_uninit: gfs2_holder_uninit(gh); @@ -815,6 +819,12 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, size_t len = iov_iter_count(from); loff_t offset = iocb->ki_pos; ssize_t ret; + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = from, + .ops = &gfs2_iomap_ops, + .wait_for_completion = is_sync_kiocb(iocb), + }; /* * Deferred lock, even if its a write, since we do no allocation on @@ -833,8 +843,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, if (offset + len > i_size_read(&ip->i_inode)) goto out; - ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, - is_sync_kiocb(iocb)); + ret = iomap_dio_rw(&args); if (ret == -ENOTBLK) ret = 0; out: diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 933f234d5bec..05cacc27578c 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -418,13 +418,13 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, * writes. The callers needs to fall back to buffered I/O in this case. */ struct iomap_dio * -__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, - const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - bool wait_for_completion) +__iomap_dio_rw(struct iomap_dio_rw_args *args) { + struct kiocb *iocb = args->iocb; + struct iov_iter *iter = args->iter; struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = file_inode(iocb->ki_filp); - size_t count = iov_iter_count(iter); + size_t count = iov_iter_count(args->iter); loff_t pos = iocb->ki_pos; loff_t end = iocb->ki_pos + count - 1, ret = 0; unsigned int flags = IOMAP_DIRECT; @@ -434,7 +434,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (!count) return NULL; - if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion)) + if (WARN_ON(is_sync_kiocb(iocb) && !args->wait_for_completion)) return ERR_PTR(-EIO); dio = kmalloc(sizeof(*dio), GFP_KERNEL); @@ -445,7 +445,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, atomic_set(&dio->ref, 1); dio->size = 0; dio->i_size = i_size_read(inode); - dio->dops = dops; + dio->dops = args->dops; dio->error = 0; dio->flags = 0; @@ -490,7 +490,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (ret) goto out_free_dio; - if (iov_iter_rw(iter) == WRITE) { + if (iov_iter_rw(args->iter) == WRITE) { /* * Try to invalidate cache pages for the range we are writing. * If this invalidation fails, let the caller fall back to @@ -503,7 +503,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, goto out_free_dio; } - if (!wait_for_completion && !inode->i_sb->s_dio_done_wq) { + if (!args->wait_for_completion && !inode->i_sb->s_dio_done_wq) { ret = sb_init_dio_done_wq(inode->i_sb); if (ret < 0) goto out_free_dio; @@ -514,12 +514,12 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, blk_start_plug(&plug); do { - ret = iomap_apply(inode, pos, count, flags, ops, dio, + ret = iomap_apply(inode, pos, count, flags, args->ops, dio, iomap_dio_actor); if (ret <= 0) { /* magic error code to fall back to buffered I/O */ if (ret == -ENOTBLK) { - wait_for_completion = true; + args->wait_for_completion = true; ret = 0; } break; @@ -566,9 +566,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, * of the final reference, and we will complete and free it here * after we got woken by the I/O completion handler. */ - dio->wait_for_completion = wait_for_completion; + dio->wait_for_completion = args->wait_for_completion; if (!atomic_dec_and_test(&dio->ref)) { - if (!wait_for_completion) + if (!args->wait_for_completion) return ERR_PTR(-EIOCBQUEUED); for (;;) { @@ -596,13 +596,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, EXPORT_SYMBOL_GPL(__iomap_dio_rw); ssize_t -iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, - const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - bool wait_for_completion) +iomap_dio_rw(struct iomap_dio_rw_args *args) { struct iomap_dio *dio; - dio = __iomap_dio_rw(iocb, iter, ops, dops, wait_for_completion); + dio = __iomap_dio_rw(args); if (IS_ERR_OR_NULL(dio)) return PTR_ERR_OR_ZERO(dio); return iomap_dio_complete(dio); diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 5b0f93f73837..29f4204e551f 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -205,6 +205,12 @@ xfs_file_dio_aio_read( struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); size_t count = iov_iter_count(to); ssize_t ret; + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = to, + .ops = &xfs_read_iomap_ops, + .wait_for_completion = is_sync_kiocb(iocb), + }; trace_xfs_file_direct_read(ip, count, iocb->ki_pos); @@ -219,8 +225,7 @@ xfs_file_dio_aio_read( } else { xfs_ilock(ip, XFS_IOLOCK_SHARED); } - ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, - is_sync_kiocb(iocb)); + ret = iomap_dio_rw(&args); xfs_iunlock(ip, XFS_IOLOCK_SHARED); return ret; @@ -519,6 +524,13 @@ xfs_file_dio_aio_write( int iolock; size_t count = iov_iter_count(from); struct xfs_buftarg *target = xfs_inode_buftarg(ip); + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = from, + .ops = &xfs_direct_write_iomap_ops, + .dops = &xfs_dio_write_ops, + .wait_for_completion = is_sync_kiocb(iocb), + }; /* DIO must be aligned to device logical sector size */ if ((iocb->ki_pos | count) & target->bt_logical_sectormask) @@ -535,6 +547,12 @@ xfs_file_dio_aio_write( ((iocb->ki_pos + count) & mp->m_blockmask)) { unaligned_io = 1; + /* + * This must be the only IO in-flight. Wait on it before we + * release the iolock to prevent subsequent overlapping IO. + */ + args.wait_for_completion = true; + /* * We can't properly handle unaligned direct I/O to reflink * files yet, as we can't unshare a partial block. @@ -578,13 +596,7 @@ xfs_file_dio_aio_write( } trace_xfs_file_direct_write(ip, count, iocb->ki_pos); - /* - * If unaligned, this is the only IO in-flight. Wait on it before we - * release the iolock to prevent subsequent overlapping IO. - */ - ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops, - &xfs_dio_write_ops, - is_sync_kiocb(iocb) || unaligned_io); + ret = iomap_dio_rw(&args); out: xfs_iunlock(ip, iolock); diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index bec47f2d074b..edf353ad1edc 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -735,6 +735,13 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) bool append = false; size_t count; ssize_t ret; + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = from, + .ops = &zonefs_iomap_ops, + .dops = &zonefs_write_dio_ops, + .wait_for_completion = sync, + }; /* * For async direct IOs to sequential zone files, refuse IOCB_NOWAIT @@ -779,8 +786,8 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) if (append) ret = zonefs_file_dio_append(iocb, from); else - ret = iomap_dio_rw(iocb, from, &zonefs_iomap_ops, - &zonefs_write_dio_ops, sync); + ret = iomap_dio_rw(&args); + if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && (ret > 0 || ret == -EIOCBQUEUED)) { if (ret > 0) @@ -909,6 +916,13 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) mutex_unlock(&zi->i_truncate_mutex); if (iocb->ki_flags & IOCB_DIRECT) { + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = to, + .ops = &zonefs_iomap_ops, + .dops = &zonefs_read_dio_ops, + .wait_for_completion = is_sync_kiocb(iocb), + }; size_t count = iov_iter_count(to); if ((iocb->ki_pos | count) & (sb->s_blocksize - 1)) { @@ -916,8 +930,7 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) goto inode_unlock; } file_accessed(iocb->ki_filp); - ret = iomap_dio_rw(iocb, to, &zonefs_iomap_ops, - &zonefs_read_dio_ops, is_sync_kiocb(iocb)); + ret = iomap_dio_rw(&args); } else { ret = generic_file_read_iter(iocb, to); if (ret == -EIO) diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 5bd3cac4df9c..16d20c01b5bb 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -256,12 +256,16 @@ struct iomap_dio_ops { struct bio *bio, loff_t file_offset); }; -ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, - const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - bool wait_for_completion); -struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, - const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - bool wait_for_completion); +struct iomap_dio_rw_args { + struct kiocb *iocb; + struct iov_iter *iter; + const struct iomap_ops *ops; + const struct iomap_dio_ops *dops; + bool wait_for_completion; +}; + +ssize_t iomap_dio_rw(struct iomap_dio_rw_args *args); +struct iomap_dio *__iomap_dio_rw(struct iomap_dio_rw_args *args); ssize_t iomap_dio_complete(struct iomap_dio *dio); int iomap_dio_iopoll(struct kiocb *kiocb, bool spin); From patchwork Tue Jan 12 01:07:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12012001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8D90C43333 for ; Tue, 12 Jan 2021 01:08:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8B5E522E02 for ; Tue, 12 Jan 2021 01:08:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731570AbhALBIy (ORCPT ); Mon, 11 Jan 2021 20:08:54 -0500 Received: from mail109.syd.optusnet.com.au ([211.29.132.80]:52068 "EHLO mail109.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731358AbhALBIx (ORCPT ); Mon, 11 Jan 2021 20:08:53 -0500 Received: from dread.disaster.area (pa49-179-167-107.pa.nsw.optusnet.com.au [49.179.167.107]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 439076633C; Tue, 12 Jan 2021 12:08:10 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1kz8A1-005WbA-Gq; Tue, 12 Jan 2021 12:07:49 +1100 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1kz8A1-004qaz-9H; Tue, 12 Jan 2021 12:07:49 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, avi@scylladb.com, andres@anarazel.de Subject: [PATCH 2/6] iomap: move DIO NOWAIT setup up into filesystems Date: Tue, 12 Jan 2021 12:07:42 +1100 Message-Id: <20210112010746.1154363-3-david@fromorbit.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210112010746.1154363-1-david@fromorbit.com> References: <20210112010746.1154363-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Ubgvt5aN c=1 sm=1 tr=0 cx=a_idp_d a=+wqVUQIkAh0lLYI+QRsciw==:117 a=+wqVUQIkAh0lLYI+QRsciw==:17 a=EmqxpYm9HcoA:10 a=20KFwNOVAAAA:8 a=iyUNGNZGDjo-1NBN2HwA:9 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Dave Chinner Add a parameter to iomap_dio_rw_args to allow callers to specify whether nonblocking (NOWAIT) submission semantics should be used by the DIO. This allows filesystems to add their own non-blocking contraints to DIO on top of the user specified constraints held in the iocb. Signed-off-by: Dave Chinner --- fs/btrfs/file.c | 4 +++- fs/ext4/file.c | 5 +++-- fs/gfs2/file.c | 2 ++ fs/iomap/direct-io.c | 2 +- fs/xfs/xfs_file.c | 2 ++ fs/zonefs/super.c | 7 ++++--- include/linux/iomap.h | 3 +++ 7 files changed, 18 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index a49d9fa918d1..2e7c3b7b70fe 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1913,9 +1913,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) .ops = &btrfs_dio_iomap_ops, .dops = &btrfs_dio_ops, .wait_for_completion = is_sync_kiocb(iocb), + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; - if (iocb->ki_flags & IOCB_NOWAIT) + if (args.nonblocking) ilock_flags |= BTRFS_ILOCK_TRY; /* If the write DIO is within EOF, use a shared lock */ @@ -3628,6 +3629,7 @@ static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to) .ops = &btrfs_dio_iomap_ops, .dops = &btrfs_dio_ops, .wait_for_completion = is_sync_kiocb(iocb), + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; if (check_direct_read(btrfs_sb(inode->i_sb), to, iocb->ki_pos)) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 436508be6d88..0ce5c4cae172 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -472,6 +472,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) .ops = &ext4_iomap_ops, .dops = &ext4_dio_write_ops, .wait_for_completion = is_sync_kiocb(iocb), + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; /* @@ -490,7 +491,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) if (offset + count > i_size_read(inode)) ilock_shared = false; - if (iocb->ki_flags & IOCB_NOWAIT) { + if (args.nonblocking) { if (ilock_shared) { if (!inode_trylock_shared(inode)) return -EAGAIN; @@ -519,7 +520,7 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) return ret; /* if we're going to block and IOCB_NOWAIT is set, return -EAGAIN */ - if ((iocb->ki_flags & IOCB_NOWAIT) && (unaligned_io || extend)) { + if (args.nonblocking && (unaligned_io || extend)) { ret = -EAGAIN; goto out; } diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index d44a5f9c5f34..ead246202144 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -793,6 +793,7 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to, .iter = to, .ops = &gfs2_iomap_ops, .wait_for_completion = is_sync_kiocb(iocb), + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; if (!count) @@ -824,6 +825,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, .iter = from, .ops = &gfs2_iomap_ops, .wait_for_completion = is_sync_kiocb(iocb), + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; /* diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 05cacc27578c..c0dd2db1253b 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -478,7 +478,7 @@ __iomap_dio_rw(struct iomap_dio_rw_args *args) dio->flags |= IOMAP_DIO_WRITE_FUA; } - if (iocb->ki_flags & IOCB_NOWAIT) { + if (args->nonblocking) { if (filemap_range_has_page(mapping, pos, end)) { ret = -EAGAIN; goto out_free_dio; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 29f4204e551f..3ced2746db4d 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -210,6 +210,7 @@ xfs_file_dio_aio_read( .iter = to, .ops = &xfs_read_iomap_ops, .wait_for_completion = is_sync_kiocb(iocb), + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; trace_xfs_file_direct_read(ip, count, iocb->ki_pos); @@ -530,6 +531,7 @@ xfs_file_dio_aio_write( .ops = &xfs_direct_write_iomap_ops, .dops = &xfs_dio_write_ops, .wait_for_completion = is_sync_kiocb(iocb), + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; /* DIO must be aligned to device logical sector size */ diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index edf353ad1edc..486ff4872077 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -741,6 +741,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) .ops = &zonefs_iomap_ops, .dops = &zonefs_write_dio_ops, .wait_for_completion = sync, + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; /* @@ -748,11 +749,10 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from) * as this can cause write reordering (e.g. the first aio gets EAGAIN * on the inode lock but the second goes through but is now unaligned). */ - if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !sync && - (iocb->ki_flags & IOCB_NOWAIT)) + if (zi->i_ztype == ZONEFS_ZTYPE_SEQ && !sync && args.nonblocking) return -EOPNOTSUPP; - if (iocb->ki_flags & IOCB_NOWAIT) { + if (args.nonblocking) { if (!inode_trylock(inode)) return -EAGAIN; } else { @@ -922,6 +922,7 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) .ops = &zonefs_iomap_ops, .dops = &zonefs_read_dio_ops, .wait_for_completion = is_sync_kiocb(iocb), + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; size_t count = iov_iter_count(to); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 16d20c01b5bb..3f85fc33a4c9 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -261,7 +261,10 @@ struct iomap_dio_rw_args { struct iov_iter *iter; const struct iomap_ops *ops; const struct iomap_dio_ops *dops; + /* wait for completion of submitted IO if true */ bool wait_for_completion; + /* use non-blocking IO submission semantics if true */ + bool nonblocking; }; ssize_t iomap_dio_rw(struct iomap_dio_rw_args *args); From patchwork Tue Jan 12 01:07:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12012003 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E521FC433E0 for ; Tue, 12 Jan 2021 01:09:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BC07B22DFB for ; Tue, 12 Jan 2021 01:09:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731589AbhALBJa (ORCPT ); Mon, 11 Jan 2021 20:09:30 -0500 Received: from mail110.syd.optusnet.com.au ([211.29.132.97]:51494 "EHLO mail110.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731358AbhALBJ3 (ORCPT ); Mon, 11 Jan 2021 20:09:29 -0500 Received: from dread.disaster.area (pa49-179-167-107.pa.nsw.optusnet.com.au [49.179.167.107]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 5BB4211717A; Tue, 12 Jan 2021 12:07:50 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1kz8A1-005WbE-Hb; Tue, 12 Jan 2021 12:07:49 +1100 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1kz8A1-004qb2-AB; Tue, 12 Jan 2021 12:07:49 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, avi@scylladb.com, andres@anarazel.de Subject: [PATCH 3/6] xfs: factor out a xfs_ilock_iocb helper Date: Tue, 12 Jan 2021 12:07:43 +1100 Message-Id: <20210112010746.1154363-4-david@fromorbit.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210112010746.1154363-1-david@fromorbit.com> References: <20210112010746.1154363-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Ubgvt5aN c=1 sm=1 tr=0 cx=a_idp_d a=+wqVUQIkAh0lLYI+QRsciw==:117 a=+wqVUQIkAh0lLYI+QRsciw==:17 a=EmqxpYm9HcoA:10 a=Wfw2kvrMWTjfDHER2ywA:9 a=R76ju1TwCYIA:10 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Christoph Hellwig Add a helper to factor out the nowait locking logical for the read/write helpers. Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_file.c | 55 +++++++++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 3ced2746db4d..4eb4555516e4 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -197,6 +197,23 @@ xfs_file_fsync( return error; } +static int +xfs_ilock_iocb( + struct kiocb *iocb, + unsigned int lock_mode) +{ + struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!xfs_ilock_nowait(ip, lock_mode)) + return -EAGAIN; + } else { + xfs_ilock(ip, lock_mode); + } + + return 0; +} + STATIC ssize_t xfs_file_dio_aio_read( struct kiocb *iocb, @@ -220,12 +237,9 @@ xfs_file_dio_aio_read( file_accessed(iocb->ki_filp); - if (iocb->ki_flags & IOCB_NOWAIT) { - if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED)) - return -EAGAIN; - } else { - xfs_ilock(ip, XFS_IOLOCK_SHARED); - } + ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); + if (ret) + return ret; ret = iomap_dio_rw(&args); xfs_iunlock(ip, XFS_IOLOCK_SHARED); @@ -246,13 +260,9 @@ xfs_file_dax_read( if (!count) return 0; /* skip atime */ - if (iocb->ki_flags & IOCB_NOWAIT) { - if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED)) - return -EAGAIN; - } else { - xfs_ilock(ip, XFS_IOLOCK_SHARED); - } - + ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); + if (ret) + return ret; ret = dax_iomap_rw(iocb, to, &xfs_read_iomap_ops); xfs_iunlock(ip, XFS_IOLOCK_SHARED); @@ -270,12 +280,9 @@ xfs_file_buffered_aio_read( trace_xfs_file_buffered_read(ip, iov_iter_count(to), iocb->ki_pos); - if (iocb->ki_flags & IOCB_NOWAIT) { - if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED)) - return -EAGAIN; - } else { - xfs_ilock(ip, XFS_IOLOCK_SHARED); - } + ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); + if (ret) + return ret; ret = generic_file_read_iter(iocb, to); xfs_iunlock(ip, XFS_IOLOCK_SHARED); @@ -622,13 +629,9 @@ xfs_file_dax_write( size_t count; loff_t pos; - if (iocb->ki_flags & IOCB_NOWAIT) { - if (!xfs_ilock_nowait(ip, iolock)) - return -EAGAIN; - } else { - xfs_ilock(ip, iolock); - } - + ret = xfs_ilock_iocb(iocb, iolock); + if (ret) + return ret; ret = xfs_file_aio_write_checks(iocb, from, &iolock); if (ret) goto out; From patchwork Tue Jan 12 01:07:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12011993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F9ACC433E0 for ; Tue, 12 Jan 2021 01:08:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1468E22D58 for ; Tue, 12 Jan 2021 01:08:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731384AbhALBIf (ORCPT ); Mon, 11 Jan 2021 20:08:35 -0500 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:37063 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731224AbhALBIf (ORCPT ); Mon, 11 Jan 2021 20:08:35 -0500 Received: from dread.disaster.area (pa49-179-167-107.pa.nsw.optusnet.com.au [49.179.167.107]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id F2FF33E5D08; Tue, 12 Jan 2021 12:07:51 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1kz8A1-005WbG-IK; Tue, 12 Jan 2021 12:07:49 +1100 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1kz8A1-004qb5-B0; Tue, 12 Jan 2021 12:07:49 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, avi@scylladb.com, andres@anarazel.de Subject: [PATCH 4/6] xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-aware Date: Tue, 12 Jan 2021 12:07:44 +1100 Message-Id: <20210112010746.1154363-5-david@fromorbit.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210112010746.1154363-1-david@fromorbit.com> References: <20210112010746.1154363-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Ubgvt5aN c=1 sm=1 tr=0 cx=a_idp_d a=+wqVUQIkAh0lLYI+QRsciw==:117 a=+wqVUQIkAh0lLYI+QRsciw==:17 a=EmqxpYm9HcoA:10 a=BJ1VQgRmi8QOorsWphoA:9 a=pHzHmUro8NiASowvMSCR:22 a=nt3jZW36AmriUCFCBwmW:22 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Christoph Hellwig Ensure we don't block on the iolock, or waiting for I/O in xfs_file_aio_write_checks if the caller asked to avoid that. Fixes: 29a5d29ec181 ("xfs: nowait aio support") Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_file.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 4eb4555516e4..512833ce1d41 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -341,7 +341,14 @@ xfs_file_aio_write_checks( if (error <= 0) return error; - error = xfs_break_layouts(inode, iolock, BREAK_WRITE); + if (iocb->ki_flags & IOCB_NOWAIT) { + error = break_layout(inode, false); + if (error == -EWOULDBLOCK) + error = -EAGAIN; + } else { + error = xfs_break_layouts(inode, iolock, BREAK_WRITE); + } + if (error) return error; @@ -352,7 +359,11 @@ xfs_file_aio_write_checks( if (*iolock == XFS_IOLOCK_SHARED && !IS_NOSEC(inode)) { xfs_iunlock(ip, *iolock); *iolock = XFS_IOLOCK_EXCL; - xfs_ilock(ip, *iolock); + error = xfs_ilock_iocb(iocb, *iolock); + if (error) { + *iolock = 0; + return error; + } goto restart; } /* @@ -374,6 +385,10 @@ xfs_file_aio_write_checks( isize = i_size_read(inode); if (iocb->ki_pos > isize) { spin_unlock(&ip->i_flags_lock); + + if (iocb->ki_flags & IOCB_NOWAIT) + return -EAGAIN; + if (!drained_dio) { if (*iolock == XFS_IOLOCK_SHARED) { xfs_iunlock(ip, *iolock); @@ -607,7 +622,8 @@ xfs_file_dio_aio_write( trace_xfs_file_direct_write(ip, count, iocb->ki_pos); ret = iomap_dio_rw(&args); out: - xfs_iunlock(ip, iolock); + if (iolock) + xfs_iunlock(ip, iolock); /* * No fallback to buffered IO after short writes for XFS, direct I/O @@ -646,7 +662,8 @@ xfs_file_dax_write( error = xfs_setfilesize(ip, pos, ret); } out: - xfs_iunlock(ip, iolock); + if (iolock) + xfs_iunlock(ip, iolock); if (error) return error; From patchwork Tue Jan 12 01:07:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12011997 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9518CC43381 for ; Tue, 12 Jan 2021 01:08:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 74738230FD for ; Tue, 12 Jan 2021 01:08:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731483AbhALBIo (ORCPT ); Mon, 11 Jan 2021 20:08:44 -0500 Received: from mail108.syd.optusnet.com.au ([211.29.132.59]:37844 "EHLO mail108.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731454AbhALBIo (ORCPT ); Mon, 11 Jan 2021 20:08:44 -0500 Received: from dread.disaster.area (pa49-179-167-107.pa.nsw.optusnet.com.au [49.179.167.107]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id 00E761B4F38; Tue, 12 Jan 2021 12:08:00 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1kz8A1-005WbI-JP; Tue, 12 Jan 2021 12:07:49 +1100 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1kz8A1-004qb8-Bm; Tue, 12 Jan 2021 12:07:49 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, avi@scylladb.com, andres@anarazel.de Subject: [PATCH 5/6] xfs: split unaligned DIO write code out Date: Tue, 12 Jan 2021 12:07:45 +1100 Message-Id: <20210112010746.1154363-6-david@fromorbit.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210112010746.1154363-1-david@fromorbit.com> References: <20210112010746.1154363-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Ubgvt5aN c=1 sm=1 tr=0 cx=a_idp_d a=+wqVUQIkAh0lLYI+QRsciw==:117 a=+wqVUQIkAh0lLYI+QRsciw==:17 a=EmqxpYm9HcoA:10 a=20KFwNOVAAAA:8 a=AsoV7O_VX2oRa6YRAogA:9 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Dave Chinner The unaligned DIO write path is more convulted than the normal path, and we are about to make it more complex. Keep the block aligned fast path dio write code trim and simple by splitting out the unaligned DIO code from it. Signed-off-by: Dave Chinner --- fs/xfs/xfs_file.c | 177 +++++++++++++++++++++++++++++----------------- 1 file changed, 113 insertions(+), 64 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 512833ce1d41..bba33be17eff 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -508,7 +508,7 @@ static const struct iomap_dio_ops xfs_dio_write_ops = { }; /* - * xfs_file_dio_aio_write - handle direct IO writes + * Handle block aligned direct IO writes * * Lock the inode appropriately to prepare for and issue a direct IO write. * By separating it from the buffered write path we remove all the tricky to @@ -518,35 +518,88 @@ static const struct iomap_dio_ops xfs_dio_write_ops = { * until we're sure the bytes at the new EOF have been zeroed and/or the cached * pages are flushed out. * - * In most cases the direct IO writes will be done holding IOLOCK_SHARED + * Returns with locks held indicated by @iolock and errors indicated by + * negative return values. + */ +STATIC ssize_t +xfs_file_dio_write_aligned( + struct xfs_inode *ip, + struct kiocb *iocb, + struct iov_iter *from) +{ + int iolock = XFS_IOLOCK_SHARED; + size_t count; + ssize_t ret; + struct iomap_dio_rw_args args = { + .iocb = iocb, + .iter = from, + .ops = &xfs_direct_write_iomap_ops, + .dops = &xfs_dio_write_ops, + .wait_for_completion = is_sync_kiocb(iocb), + .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), + }; + + ret = xfs_ilock_iocb(iocb, iolock); + if (ret) + return ret; + ret = xfs_file_aio_write_checks(iocb, from, &iolock); + if (ret) + goto out; + count = iov_iter_count(from); + + /* + * We don't need to hold the IOLOCK exclusively across the IO, so demote + * the iolock back to shared if we had to take the exclusive lock in + * xfs_file_aio_write_checks() for other reasons. + */ + if (iolock == XFS_IOLOCK_EXCL) { + xfs_ilock_demote(ip, XFS_IOLOCK_EXCL); + iolock = XFS_IOLOCK_SHARED; + } + + trace_xfs_file_direct_write(ip, count, iocb->ki_pos); + ret = iomap_dio_rw(&args); +out: + if (iolock) + xfs_iunlock(ip, iolock); + + /* + * No fallback to buffered IO after short writes for XFS, direct I/O + * will either complete fully or return an error. + */ + ASSERT(ret < 0 || ret == count); + return ret; +} + +/* + * Handle block unaligned direct IO writes + * + * In most cases direct IO writes will be done holding IOLOCK_SHARED * allowing them to be done in parallel with reads and other direct IO writes. * However, if the IO is not aligned to filesystem blocks, the direct IO layer - * needs to do sub-block zeroing and that requires serialisation against other + * may need to do sub-block zeroing and that requires serialisation against other * direct IOs to the same block. In this case we need to serialise the * submission of the unaligned IOs so that we don't get racing block zeroing in - * the dio layer. To avoid the problem with aio, we also need to wait for + * the dio layer. + * + * To provide the same serialisation for AIO, we also need to wait for * outstanding IOs to complete so that unwritten extent conversion is completed * before we try to map the overlapping block. This is currently implemented by * hitting it with a big hammer (i.e. inode_dio_wait()). * - * Returns with locks held indicated by @iolock and errors indicated by - * negative return values. + * This means that unaligned dio writes alwys block. There is no "nowait" fast + * path in this code - if IOCB_NOWAIT is set we simply return -EAGAIN up front + * and we don't have to worry about that anymore. */ -STATIC ssize_t -xfs_file_dio_aio_write( +static ssize_t +xfs_file_dio_write_unaligned( + struct xfs_inode *ip, struct kiocb *iocb, struct iov_iter *from) { - struct file *file = iocb->ki_filp; - struct address_space *mapping = file->f_mapping; - struct inode *inode = mapping->host; - struct xfs_inode *ip = XFS_I(inode); - struct xfs_mount *mp = ip->i_mount; - ssize_t ret = 0; - int unaligned_io = 0; - int iolock; - size_t count = iov_iter_count(from); - struct xfs_buftarg *target = xfs_inode_buftarg(ip); + int iolock = XFS_IOLOCK_EXCL; + size_t count; + ssize_t ret; struct iomap_dio_rw_args args = { .iocb = iocb, .iter = from, @@ -556,49 +609,25 @@ xfs_file_dio_aio_write( .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), }; - /* DIO must be aligned to device logical sector size */ - if ((iocb->ki_pos | count) & target->bt_logical_sectormask) - return -EINVAL; - /* - * Don't take the exclusive iolock here unless the I/O is unaligned to - * the file system block size. We don't need to consider the EOF - * extension case here because xfs_file_aio_write_checks() will relock - * the inode as necessary for EOF zeroing cases and fill out the new - * inode size as appropriate. + * This must be the only IO in-flight. Wait on it before we + * release the iolock to prevent subsequent overlapping IO. */ - if ((iocb->ki_pos & mp->m_blockmask) || - ((iocb->ki_pos + count) & mp->m_blockmask)) { - unaligned_io = 1; - - /* - * This must be the only IO in-flight. Wait on it before we - * release the iolock to prevent subsequent overlapping IO. - */ - args.wait_for_completion = true; + args.wait_for_completion = true; - /* - * We can't properly handle unaligned direct I/O to reflink - * files yet, as we can't unshare a partial block. - */ - if (xfs_is_cow_inode(ip)) { - trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos, count); - return -ENOTBLK; - } - iolock = XFS_IOLOCK_EXCL; - } else { - iolock = XFS_IOLOCK_SHARED; + /* + * We can't properly handle unaligned direct I/O to reflink + * files yet, as we can't unshare a partial block. + */ + if (xfs_is_cow_inode(ip)) { + trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos, count); + return -ENOTBLK; } - if (iocb->ki_flags & IOCB_NOWAIT) { - /* unaligned dio always waits, bail */ - if (unaligned_io) - return -EAGAIN; - if (!xfs_ilock_nowait(ip, iolock)) - return -EAGAIN; - } else { - xfs_ilock(ip, iolock); - } + /* unaligned dio always waits, bail */ + if (iocb->ki_flags & IOCB_NOWAIT) + return -EAGAIN; + xfs_ilock(ip, iolock); ret = xfs_file_aio_write_checks(iocb, from, &iolock); if (ret) @@ -612,13 +641,7 @@ xfs_file_dio_aio_write( * iolock if we had to take the exclusive lock in * xfs_file_aio_write_checks() for other reasons. */ - if (unaligned_io) { - inode_dio_wait(inode); - } else if (iolock == XFS_IOLOCK_EXCL) { - xfs_ilock_demote(ip, XFS_IOLOCK_EXCL); - iolock = XFS_IOLOCK_SHARED; - } - + inode_dio_wait(VFS_I(ip)); trace_xfs_file_direct_write(ip, count, iocb->ki_pos); ret = iomap_dio_rw(&args); out: @@ -633,6 +656,32 @@ xfs_file_dio_aio_write( return ret; } +static ssize_t +xfs_file_dio_write( + struct kiocb *iocb, + struct iov_iter *from) +{ + struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); + struct xfs_mount *mp = ip->i_mount; + struct xfs_buftarg *target = xfs_inode_buftarg(ip); + size_t count = iov_iter_count(from); + + /* DIO must be aligned to device logical sector size */ + if ((iocb->ki_pos | count) & target->bt_logical_sectormask) + return -EINVAL; + + /* + * Don't take the exclusive iolock here unless the I/O is unaligned to + * the file system block size. We don't need to consider the EOF + * extension case here because xfs_file_aio_write_checks() will relock + * the inode as necessary for EOF zeroing cases and fill out the new + * inode size as appropriate. + */ + if ((iocb->ki_pos | count) & mp->m_blockmask) + return xfs_file_dio_write_unaligned(ip, iocb, from); + return xfs_file_dio_write_aligned(ip, iocb, from); +} + static noinline ssize_t xfs_file_dax_write( struct kiocb *iocb, @@ -783,7 +832,7 @@ xfs_file_write_iter( * CoW. In all other directio scenarios we do not * allow an operation to fall back to buffered mode. */ - ret = xfs_file_dio_aio_write(iocb, from); + ret = xfs_file_dio_write(iocb, from); if (ret != -ENOTBLK) return ret; } From patchwork Tue Jan 12 01:07:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12011995 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7037AC433E6 for ; Tue, 12 Jan 2021 01:08:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4844F22D58 for ; Tue, 12 Jan 2021 01:08:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731447AbhALBIh (ORCPT ); Mon, 11 Jan 2021 20:08:37 -0500 Received: from mail106.syd.optusnet.com.au ([211.29.132.42]:42418 "EHLO mail106.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731358AbhALBIh (ORCPT ); Mon, 11 Jan 2021 20:08:37 -0500 Received: from dread.disaster.area (pa49-179-167-107.pa.nsw.optusnet.com.au [49.179.167.107]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 7EAA3824D07; Tue, 12 Jan 2021 12:07:52 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1kz8A1-005WbK-Ky; Tue, 12 Jan 2021 12:07:49 +1100 Received: from dave by discord.disaster.area with local (Exim 4.94) (envelope-from ) id 1kz8A1-004qbB-Cn; Tue, 12 Jan 2021 12:07:49 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, avi@scylladb.com, andres@anarazel.de Subject: [PATCH 6/6] xfs: reduce exclusive locking on unaligned dio Date: Tue, 12 Jan 2021 12:07:46 +1100 Message-Id: <20210112010746.1154363-7-david@fromorbit.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210112010746.1154363-1-david@fromorbit.com> References: <20210112010746.1154363-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Ubgvt5aN c=1 sm=1 tr=0 cx=a_idp_d a=+wqVUQIkAh0lLYI+QRsciw==:117 a=+wqVUQIkAh0lLYI+QRsciw==:17 a=EmqxpYm9HcoA:10 a=20KFwNOVAAAA:8 a=kV-h00eVNQN3drKQWOoA:9 a=DiKeHqHhRZ4A:10 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Dave Chinner Attempt shared locking for unaligned DIO, but only if the the underlying extent is already allocated and in written state. On failure, retry with the existing exclusive locking. Test case is fio randrw of 512 byte IOs using AIO and an iodepth of 32 IOs. Vanilla: READ: bw=4560KiB/s (4670kB/s), 4560KiB/s-4560KiB/s (4670kB/s-4670kB/s), io=134MiB (140MB), run=30001-30001msec WRITE: bw=4567KiB/s (4676kB/s), 4567KiB/s-4567KiB/s (4676kB/s-4676kB/s), io=134MiB (140MB), run=30001-30001msec Patched: READ: bw=37.6MiB/s (39.4MB/s), 37.6MiB/s-37.6MiB/s (39.4MB/s-39.4MB/s), io=1127MiB (1182MB), run=30002-30002msec WRITE: bw=37.6MiB/s (39.4MB/s), 37.6MiB/s-37.6MiB/s (39.4MB/s-39.4MB/s), io=1128MiB (1183MB), run=30002-30002msec That's an improvement from ~18k IOPS to a ~150k IOPS, which is about the IOPS limit of the VM block device setup I'm testing on. 4kB block IO comparison: READ: bw=296MiB/s (310MB/s), 296MiB/s-296MiB/s (310MB/s-310MB/s), io=8868MiB (9299MB), run=30002-30002msec WRITE: bw=296MiB/s (310MB/s), 296MiB/s-296MiB/s (310MB/s-310MB/s), io=8878MiB (9309MB), run=30002-30002msec Which is ~150k IOPS, same as what the test gets for sub-block AIO+DIO writes with this patch. Signed-off-by: Dave Chinner --- fs/xfs/xfs_file.c | 94 +++++++++++++++++++++++++++++++--------------- fs/xfs/xfs_iomap.c | 32 +++++++++++----- 2 files changed, 86 insertions(+), 40 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index bba33be17eff..f5c75404b8a5 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -408,7 +408,7 @@ xfs_file_aio_write_checks( drained_dio = true; goto restart; } - + trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize); error = iomap_zero_range(inode, isize, iocb->ki_pos - isize, NULL, &xfs_buffered_write_iomap_ops); @@ -510,9 +510,9 @@ static const struct iomap_dio_ops xfs_dio_write_ops = { /* * Handle block aligned direct IO writes * - * Lock the inode appropriately to prepare for and issue a direct IO write. - * By separating it from the buffered write path we remove all the tricky to - * follow locking changes and looping. + * Lock the inode appropriately to prepare for and issue a direct IO write. By + * separating it from the buffered write path we remove all the tricky to follow + * locking changes and looping. * * If there are cached pages or we're extending the file, we need IOLOCK_EXCL * until we're sure the bytes at the new EOF have been zeroed and/or the cached @@ -578,18 +578,31 @@ xfs_file_dio_write_aligned( * allowing them to be done in parallel with reads and other direct IO writes. * However, if the IO is not aligned to filesystem blocks, the direct IO layer * may need to do sub-block zeroing and that requires serialisation against other - * direct IOs to the same block. In this case we need to serialise the - * submission of the unaligned IOs so that we don't get racing block zeroing in - * the dio layer. + * direct IOs to the same block. In the case where sub-block zeroing is not + * required, we can do concurrent sub-block dios to the same block successfully. + * + * Hence we have two cases here - the shared, optimisitic fast path for written + * extents, and everything else that needs exclusive IO path access across the + * entire IO. + * + * For the first case, we do all the checks we need at the mapping layer in the + * DIO code as part of the existing NOWAIT infrastructure. Hence all we need to + * do to support concurrent subblock dio is first try a non-blocking submission. + * If that returns -EAGAIN, then we simply repeat the IO submission with full + * IO exclusivity guaranteed so that we avoid racing sub-block zeroing. + * + * The only wrinkle in this case is that the iomap DIO code always does + * partial tail sub-block zeroing for post-EOF writes. Hence for any IO that + * _ends_ past the current EOF we need to run with full exclusivity. Note that + * we also check for the start of IO being beyond EOF because then zeroing + * between the old EOF and the start of the IO is required and that also + * requires exclusivity. Hence we avoid lock cycles and blocking under + * IOCB_NOWAIT for this situation, too. * - * To provide the same serialisation for AIO, we also need to wait for + * To provide the exclusivity required when using AIO, we also need to wait for * outstanding IOs to complete so that unwritten extent conversion is completed * before we try to map the overlapping block. This is currently implemented by * hitting it with a big hammer (i.e. inode_dio_wait()). - * - * This means that unaligned dio writes alwys block. There is no "nowait" fast - * path in this code - if IOCB_NOWAIT is set we simply return -EAGAIN up front - * and we don't have to worry about that anymore. */ static ssize_t xfs_file_dio_write_unaligned( @@ -597,23 +610,35 @@ xfs_file_dio_write_unaligned( struct kiocb *iocb, struct iov_iter *from) { - int iolock = XFS_IOLOCK_EXCL; + int iolock = XFS_IOLOCK_SHARED; size_t count; ssize_t ret; + size_t isize = i_size_read(VFS_I(ip)); struct iomap_dio_rw_args args = { .iocb = iocb, .iter = from, .ops = &xfs_direct_write_iomap_ops, .dops = &xfs_dio_write_ops, .wait_for_completion = is_sync_kiocb(iocb), - .nonblocking = (iocb->ki_flags & IOCB_NOWAIT), + .nonblocking = true, }; /* - * This must be the only IO in-flight. Wait on it before we - * release the iolock to prevent subsequent overlapping IO. + * Extending writes need exclusivity because of the sub-block zeroing + * that the DIO code always does for partial tail blocks beyond EOF. */ - args.wait_for_completion = true; + if (iocb->ki_pos > isize || iocb->ki_pos + count >= isize) { +retry_exclusive: + if (iocb->ki_flags & IOCB_NOWAIT) + return -EAGAIN; + iolock = XFS_IOLOCK_EXCL; + args.nonblocking = false; + args.wait_for_completion = true; + } + + ret = xfs_ilock_iocb(iocb, iolock); + if (ret) + return ret; /* * We can't properly handle unaligned direct I/O to reflink @@ -621,30 +646,37 @@ xfs_file_dio_write_unaligned( */ if (xfs_is_cow_inode(ip)) { trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos, count); - return -ENOTBLK; + ret = -ENOTBLK; + goto out_unlock; } - /* unaligned dio always waits, bail */ - if (iocb->ki_flags & IOCB_NOWAIT) - return -EAGAIN; - xfs_ilock(ip, iolock); - ret = xfs_file_aio_write_checks(iocb, from, &iolock); if (ret) - goto out; + goto out_unlock; count = iov_iter_count(from); /* - * If we are doing unaligned IO, we can't allow any other overlapping IO - * in-flight at the same time or we risk data corruption. Wait for all - * other IO to drain before we submit. If the IO is aligned, demote the - * iolock if we had to take the exclusive lock in - * xfs_file_aio_write_checks() for other reasons. + * If we are doing exclusive unaligned IO, we can't allow any other + * overlapping IO in-flight at the same time or we risk data corruption. + * Wait for all other IO to drain before we submit. */ - inode_dio_wait(VFS_I(ip)); + if (!args.nonblocking) + inode_dio_wait(VFS_I(ip)); trace_xfs_file_direct_write(ip, count, iocb->ki_pos); ret = iomap_dio_rw(&args); -out: + + /* + * Retry unaligned IO with exclusive blocking semantics if the DIO + * layer rejected it for mapping or locking reasons. If we are doing + * nonblocking user IO, propagate the error. + */ + if (ret == -EAGAIN) { + ASSERT(args.nonblocking == true); + xfs_iunlock(ip, iolock); + goto retry_exclusive; + } + +out_unlock: if (iolock) xfs_iunlock(ip, iolock); diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 7b9ff824e82d..e5659200e5e8 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -783,16 +783,30 @@ xfs_direct_write_iomap_begin( if (imap_needs_alloc(inode, flags, &imap, nimaps)) goto allocate_blocks; - /* - * NOWAIT IO needs to span the entire requested IO with a single map so - * that we avoid partial IO failures due to the rest of the IO range not - * covered by this map triggering an EAGAIN condition when it is - * subsequently mapped and aborting the IO. - */ - if ((flags & IOMAP_NOWAIT) && - !imap_spans_range(&imap, offset_fsb, end_fsb)) { + /* Handle special NOWAIT conditions for existing allocated extents. */ + if (flags & IOMAP_NOWAIT) { error = -EAGAIN; - goto out_unlock; + /* + * NOWAIT IO needs to span the entire requested IO with a single + * map so that we avoid partial IO failures due to the rest of + * the IO range not covered by this map triggering an EAGAIN + * condition when it is subsequently mapped and aborting the IO. + */ + if (!imap_spans_range(&imap, offset_fsb, end_fsb)) + goto out_unlock; + + /* + * If the IO is unaligned and the caller holds a shared IOLOCK, + * NOWAIT will be set because we can only do the IO if it spans + * a written extent. Otherwise we have to do sub-block zeroing, + * and that can only be done under an exclusive IOLOCK. Hence if + * this is not a written extent, return EAGAIN to tell the + * caller to try again. + */ + if (imap.br_state != XFS_EXT_NORM && + ((offset & mp->m_blockmask) || + ((offset + length) & mp->m_blockmask))) + goto out_unlock; } xfs_iunlock(ip, lockmode);