Message ID | 20191212003043.31093-5-rgoldwyn@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs direct-io using iomap | expand |
On Wed, Dec 11, 2019 at 06:30:39PM -0600, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues <rgoldwyn@suse.com> > > Filesystems such as btrfs can perform direct I/O without holding the > inode->i_rwsem in some of the cases like writing within i_size. How is that safe? > + lockdep_assert_held(&file_inode(file)->i_rwsem); Having the asserts in the callers is pointless. The assert is inside the iomap helper to ensure the expected calling conventions, as the code is written under the assumption that we have i_rwsem.
On 1:50 12/12, Christoph Hellwig wrote: > On Wed, Dec 11, 2019 at 06:30:39PM -0600, Goldwyn Rodrigues wrote: > > From: Goldwyn Rodrigues <rgoldwyn@suse.com> > > > > Filesystems such as btrfs can perform direct I/O without holding the > > inode->i_rwsem in some of the cases like writing within i_size. > > How is that safe? This (inode_lock release) is only done for writes within i_size. We only have to safeguard write against truncates, which is done by inode_dio_wait() call in the truncate sequence (I had mistakenly removed it in patch 8/8, I shall reinstate that). The commit that introduced this optimization is 38851cc19adb ("Btrfs: implement unlocked dio write") > > > + lockdep_assert_held(&file_inode(file)->i_rwsem); > > Having the asserts in the callers is pointless. The assert is inside > the iomap helper to ensure the expected calling conventions, as the > code is written under the assumption that we have i_rwsem. Hmm, conflicting opinions from you and Dave. Anyways, I have removed it in individual filesystems.
On Thu, Dec 12, 2019 at 01:50:44AM -0800, Christoph Hellwig wrote: > On Wed, Dec 11, 2019 at 06:30:39PM -0600, Goldwyn Rodrigues wrote: > > From: Goldwyn Rodrigues <rgoldwyn@suse.com> > > > > Filesystems such as btrfs can perform direct I/O without holding the > > inode->i_rwsem in some of the cases like writing within i_size. > > How is that safe? > > > + lockdep_assert_held(&file_inode(file)->i_rwsem); > > Having the asserts in the callers is pointless. The assert is inside > the iomap helper to ensure the expected calling conventions, as the > code is written under the assumption that we have i_rwsem. It's written under the assumption that the caller has already performed the appropriate locking they require for serialisation against other operations on that inode. The fact that the filesystems up to this point all used the i_rwsem is largely irrelevant, and filesystems don't have to use the i_rwsem to serialise their IO. e.g. go back a handful of years and this would have needed to take into account an XFS specific rwsem, not the VFS inode mutex... Indeed, the IO range locking patches I have for XFS get rid of this lockdep assert in iomap because we no longer use the i_rwsem for IO serialisation in XFS - we go back to using an internal XFS construct for IO serialisation and don't use the i_rwsem at all. Cheers, Dave.
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 9d58295ccf7a..d0517a78640b 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -771,6 +771,8 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to) if (ret) goto out_uninit; + lockdep_assert_held(&file_inode(file)->i_rwsem); + ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, is_sync_kiocb(iocb)); @@ -807,6 +809,8 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from) if (offset + len > i_size_read(&ip->i_inode)) goto out; + lockdep_assert_held(&inode->i_rwsem); + ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, is_sync_kiocb(iocb)); diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 1a3bf3bd86fb..41c1e7c20a1f 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -415,8 +415,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, struct blk_plug plug; struct iomap_dio *dio; - lockdep_assert_held(&inode->i_rwsem); - if (!count) return 0; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index c93250108952..dfaccd4d6e6c 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -176,7 +176,8 @@ xfs_file_dio_aio_read( struct kiocb *iocb, struct iov_iter *to) { - struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); + struct inode *inode = file_inode(iocb->ki_filp); + struct xfs_inode *ip = XFS_I(inode); size_t count = iov_iter_count(to); ssize_t ret; @@ -188,6 +189,9 @@ xfs_file_dio_aio_read( file_accessed(iocb->ki_filp); xfs_ilock(ip, XFS_IOLOCK_SHARED); + + lockdep_assert_held(&inode->i_rwsem); + ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, is_sync_kiocb(iocb)); xfs_iunlock(ip, XFS_IOLOCK_SHARED); @@ -547,6 +551,9 @@ xfs_file_dio_aio_write( } trace_xfs_file_direct_write(ip, count, iocb->ki_pos); + + lockdep_assert_held(&inode->i_rwsem); + /* * If unaligned, this is the only IO in-flight. Wait on it before we * release the iolock to prevent subsequent overlapping IO.