Message ID | 20170403185307.6243-7-rgoldwyn@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon 03-04-17 13:53:05, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues <rgoldwyn@suse.com> > > Return EAGAIN if any of the following checks fail for direct I/O: > + i_rwsem is lockable > + Writing beyond end of file (will trigger allocation) > + Blocks are not allocated at the write location Patches seem to be missing your Signed-off-by tag... > @@ -235,9 +237,21 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) > > iocb->private = &overwrite; > /* Check whether we do a DIO overwrite or not */ > - if (o_direct && ext4_should_dioread_nolock(inode) && !unaligned_aio && > - ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from))) > - overwrite = 1; > + if (o_direct && !unaligned_aio) { > + struct ext4_map_blocks map; > + if (ext4_blocks_mapped(inode, iocb->ki_pos, > + iov_iter_count(from), &map)) { > + /* To exclude unwritten extents, we need to check > + * m_flags. > + */ > + if (ext4_should_dioread_nolock(inode) && > + (map.m_flags & EXT4_MAP_MAPPED)) > + overwrite = 1; > + } else if (iocb->ki_flags & IOCB_NOWAIT) { > + ret = -EAGAIN; > + goto out; > + } > + } Actually, overwriting unwritten extents is relatively complex in ext4 as well. In particular we need to start a transaction and split out the written part of the extent. So I don't think we can easily support this without blocking. As a result I'd keep the condition for IOCB_NOWAIT the same as for overwrite IO. > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -117,7 +117,7 @@ static struct file_system_type ext2_fs_type = { > .name = "ext2", > .mount = ext4_mount, > .kill_sb = kill_block_super, > - .fs_flags = FS_REQUIRES_DEV, > + .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT, FS_NOWAIT looks a bit too generic given these are filesystem feature flags. Can we call it FS_NOWAIT_IO? Honza
On Tue, Apr 04, 2017 at 09:58:53AM +0200, Jan Kara wrote: > FS_NOWAIT looks a bit too generic given these are filesystem feature flags. > Can we call it FS_NOWAIT_IO? It's way to generic as it's a feature of the particular file_operations instance. But once we switch to using RWF_* we can just the existing per-op feature checks for thos and the per-fs flag should just go away.
On 04/04/2017 03:41 AM, Christoph Hellwig wrote: > On Tue, Apr 04, 2017 at 09:58:53AM +0200, Jan Kara wrote: >> FS_NOWAIT looks a bit too generic given these are filesystem feature flags. >> Can we call it FS_NOWAIT_IO? > > It's way to generic as it's a feature of the particular file_operations > instance. But once we switch to using RWF_* we can just the existing > per-op feature checks for thos and the per-fs flag should just go away. > I am working on incorporating RWF_* flags. However, I am not sure how RWF_* flags would get rid of FS_NOWAIT/FS_NOWAIT_IO. Since most of "blocking" information is with the filesystem, it is a per-filesystem flag to block out (EOPNOTSUPP) the filesystems which do not support it.
On Tue, Apr 04, 2017 at 01:41:09PM -0500, Goldwyn Rodrigues wrote: > I am working on incorporating RWF_* flags. However, I am not sure how > RWF_* flags would get rid of FS_NOWAIT/FS_NOWAIT_IO. Since most of > "blocking" information is with the filesystem, it is a per-filesystem > flag to block out (EOPNOTSUPP) the filesystems which do not support it. You need to check the flag in the actual read/write methods as the support for features on Linux is not a per-file_system_type thing.
On Mon 10-04-17 00:45:39, Christoph Hellwig wrote: > On Tue, Apr 04, 2017 at 01:41:09PM -0500, Goldwyn Rodrigues wrote: > > I am working on incorporating RWF_* flags. However, I am not sure how > > RWF_* flags would get rid of FS_NOWAIT/FS_NOWAIT_IO. Since most of > > "blocking" information is with the filesystem, it is a per-filesystem > > flag to block out (EOPNOTSUPP) the filesystems which do not support it. > > You need to check the flag in the actual read/write methods as the > support for features on Linux is not a per-file_system_type thing. I don't understand here. Do you want that all filesystems support NOWAIT direct IO? IMO that's not realistic and also not necessary. In reality different filesystems support different sets or operations and we have a precedens for that in various fallocate operations, rename exchange, or O_TMPFILE support... Honza
On Mon, Apr 10, 2017 at 02:37:50PM +0200, Jan Kara wrote: > I don't understand here. Do you want that all filesystems support NOWAIT > direct IO? No. Per-file_system_type is way to coarse grained. All feature flag needs to be per-file_operation at least for cases like ext4 with our without extents (or journal) XFS v4 vs v5, different NFS versions, etc. For RWF_* each file operation simply declares if the feature is supported not by rejecting unknown ones. FIEMAP does the same as do a few other interfaces.
On Mon 10-04-17 07:39:43, Christoph Hellwig wrote: > On Mon, Apr 10, 2017 at 02:37:50PM +0200, Jan Kara wrote: > > I don't understand here. Do you want that all filesystems support NOWAIT > > direct IO? > > No. Per-file_system_type is way to coarse grained. All feature flag > needs to be per-file_operation at least for cases like ext4 with our > without extents (or journal) XFS v4 vs v5, different NFS versions, etc. Ah, I see your point now. Thanks for patience. I think we could make this work by making generic_file_write/read_iter() refuse NOWAIT IO with EOPNOTSUPP and then only modify those few filesystems that implement their own iter helpers and will not initially support NOWAIT IO. Sounds easy enough. Honza
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 8210c1f..e223b9f 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -127,27 +127,22 @@ ext4_unaligned_aio(struct inode *inode, struct iov_iter *from, loff_t pos) return 0; } -/* Is IO overwriting allocated and initialized blocks? */ -static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len) +/* Are IO blocks allocated */ +static bool ext4_blocks_mapped(struct inode *inode, loff_t pos, loff_t len, + struct ext4_map_blocks *map) { - struct ext4_map_blocks map; unsigned int blkbits = inode->i_blkbits; int err, blklen; if (pos + len > i_size_read(inode)) return false; - map.m_lblk = pos >> blkbits; - map.m_len = EXT4_MAX_BLOCKS(len, pos, blkbits); - blklen = map.m_len; + map->m_lblk = pos >> blkbits; + map->m_len = EXT4_MAX_BLOCKS(len, pos, blkbits); + blklen = map->m_len; - err = ext4_map_blocks(NULL, inode, &map, 0); - /* - * 'err==len' means that all of the blocks have been preallocated, - * regardless of whether they have been initialized or not. To exclude - * unwritten extents, we need to check m_flags. - */ - return err == blklen && (map.m_flags & EXT4_MAP_MAPPED); + err = ext4_map_blocks(NULL, inode, map, 0); + return err == blklen; } static ssize_t ext4_write_checks(struct kiocb *iocb, struct iov_iter *from) @@ -204,6 +199,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); int o_direct = iocb->ki_flags & IOCB_DIRECT; + int nowait = iocb->ki_flags & IOCB_NOWAIT; int unaligned_aio = 0; int overwrite = 0; ssize_t ret; @@ -216,7 +212,13 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) return ext4_dax_write_iter(iocb, from); #endif - inode_lock(inode); + if (o_direct && nowait) { + if (!inode_trylock(inode)) + return -EAGAIN; + } else { + inode_lock(inode); + } + ret = ext4_write_checks(iocb, from); if (ret <= 0) goto out; @@ -235,9 +237,21 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from) iocb->private = &overwrite; /* Check whether we do a DIO overwrite or not */ - if (o_direct && ext4_should_dioread_nolock(inode) && !unaligned_aio && - ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from))) - overwrite = 1; + if (o_direct && !unaligned_aio) { + struct ext4_map_blocks map; + if (ext4_blocks_mapped(inode, iocb->ki_pos, + iov_iter_count(from), &map)) { + /* To exclude unwritten extents, we need to check + * m_flags. + */ + if (ext4_should_dioread_nolock(inode) && + (map.m_flags & EXT4_MAP_MAPPED)) + overwrite = 1; + } else if (iocb->ki_flags & IOCB_NOWAIT) { + ret = -EAGAIN; + goto out; + } + } ret = __generic_file_write_iter(iocb, from); inode_unlock(inode); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a9448db..e1d424a 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -117,7 +117,7 @@ static struct file_system_type ext2_fs_type = { .name = "ext2", .mount = ext4_mount, .kill_sb = kill_block_super, - .fs_flags = FS_REQUIRES_DEV, + .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT, }; MODULE_ALIAS_FS("ext2"); MODULE_ALIAS("ext2"); @@ -132,7 +132,7 @@ static struct file_system_type ext3_fs_type = { .name = "ext3", .mount = ext4_mount, .kill_sb = kill_block_super, - .fs_flags = FS_REQUIRES_DEV, + .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT, }; MODULE_ALIAS_FS("ext3"); MODULE_ALIAS("ext3"); @@ -5623,7 +5623,7 @@ static struct file_system_type ext4_fs_type = { .name = "ext4", .mount = ext4_mount, .kill_sb = kill_block_super, - .fs_flags = FS_REQUIRES_DEV, + .fs_flags = FS_REQUIRES_DEV | FS_NOWAIT, }; MODULE_ALIAS_FS("ext4");
From: Goldwyn Rodrigues <rgoldwyn@suse.com> Return EAGAIN if any of the following checks fail for direct I/O: + i_rwsem is lockable + Writing beyond end of file (will trigger allocation) + Blocks are not allocated at the write location --- fs/ext4/file.c | 48 +++++++++++++++++++++++++++++++----------------- fs/ext4/super.c | 6 +++--- 2 files changed, 34 insertions(+), 20 deletions(-)