Message ID | 20240410150313.2820364-1-yi.zhang@huaweicloud.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | ext4: use iomap for regular file's buffered IO path and enable large folio | expand |
On Wed, Apr 10, 2024 at 11:03:08PM +0800, Zhang Yi wrote: > From: Zhang Yi <yi.zhang@huawei.com> > > Online defrag doesn't support iomap path yet, we have to fall back to > buffer_head path for the inode which has been using iomap. Changing > active inode is dangerous, before we start, we must hold the inode lock > and the mapping->invalidate_lock, and writeback all dirty folios and > drop the inode's pagecache. Even then, I don't think this is obviously safe. We went through this with DAX and we couldn't make it work safely. Just return EOPNOTSUPP to the online defrag ioctl if iomap is in use - that avoids all the excitement involved in doing dangerous things like swapping aops structures on actively referenced inodes... -Dave.
On 2024/5/1 17:32, Dave Chinner wrote: > On Wed, Apr 10, 2024 at 11:03:08PM +0800, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@huawei.com> >> >> Online defrag doesn't support iomap path yet, we have to fall back to >> buffer_head path for the inode which has been using iomap. Changing >> active inode is dangerous, before we start, we must hold the inode lock >> and the mapping->invalidate_lock, and writeback all dirty folios and >> drop the inode's pagecache. > > Even then, I don't think this is obviously safe. We went through > this with DAX and we couldn't make it work safely. > > Just return EOPNOTSUPP to the online defrag ioctl if iomap is in use > - that avoids all the excitement involved in doing dangerous things > like swapping aops structures on actively referenced inodes... > Okay, this is just a temporary solution to support defrag. I've been looking at how to support defrag for iomap recently, I hope it could be supported in the near future, so let's drop this dangerous operation. Thanks, Yi.
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c index 7cd4afa4de1d..3db255385367 100644 --- a/fs/ext4/move_extent.c +++ b/fs/ext4/move_extent.c @@ -538,6 +538,34 @@ mext_check_arguments(struct inode *orig_inode, return 0; } +/* + * Disable buffered iomap path for the inode that requiring move extents, + * fallback to buffer_head path. + */ +static int ext4_disable_buffered_iomap_aops(struct inode *inode) +{ + int err; + + /* + * The buffered_head aops don't know how to handle folios + * dirtied by iomap, so before falling back, flush all dirty + * folios the inode has. + */ + filemap_invalidate_lock(inode->i_mapping); + err = filemap_write_and_wait(inode->i_mapping); + if (err < 0) { + filemap_invalidate_unlock(inode->i_mapping); + return err; + } + truncate_inode_pages(inode->i_mapping, 0); + + ext4_clear_inode_state(inode, EXT4_STATE_BUFFERED_IOMAP); + ext4_set_aops(inode); + filemap_invalidate_unlock(inode->i_mapping); + + return 0; +} + /** * ext4_move_extents - Exchange the specified range of a file * @@ -609,6 +637,12 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk, inode_dio_wait(orig_inode); inode_dio_wait(donor_inode); + /* Fallback to buffer_head aops for inodes with buffered iomap aops */ + if (ext4_test_inode_state(orig_inode, EXT4_STATE_BUFFERED_IOMAP)) + ext4_disable_buffered_iomap_aops(orig_inode); + if (ext4_test_inode_state(donor_inode, EXT4_STATE_BUFFERED_IOMAP)) + ext4_disable_buffered_iomap_aops(donor_inode); + /* Protect extent tree against block allocations via delalloc */ ext4_double_down_write_data_sem(orig_inode, donor_inode); /* Check the filesystem environment whether move_extent can be done */