diff mbox series

[2/4] ocfs2: fix races between hole punching and AIO+DIO

Message ID 20240331111744.7224-3-l@damenly.org (mailing list archive)
State New, archived
Headers show
Series ocfs2 bugs fixes exposed by fstests | expand

Commit Message

Su Yue March 31, 2024, 11:17 a.m. UTC
From: Su Yue <glass.su@suse.com>

After commit "ocfs2: return real error code in ocfs2_dio_wr_get_block",
fstests/generic/300 become from always failed to sometimes failed:

========================================================================
[  473.293420 ] run fstests generic/300

[  475.296983 ] JBD2: Ignoring recovery information on journal
[  475.302473 ] ocfs2: Mounting device (253,1) on (node local, slot 0)
with ordered data mode.
[  494.290998 ] OCFS2: ERROR (device dm-1): ocfs2_change_extent_flag:
Owner 5668 has an extent at cpos 78723 which can no longer be found
[  494.291609 ] On-disk corruption discovered. Please run fsck.ocfs2
once the filesystem is unmounted.
[  494.292018 ] OCFS2: File system is now read-only.
[  494.292224 ] (kworker/19:11,2628,19):ocfs2_mark_extent_written:5272
ERROR: status = -30
[  494.292602 ] (kworker/19:11,2628,19):ocfs2_dio_end_io_write:2374
ERROR: status = -3
fio: io_u error on file /mnt/scratch/racer: Read-only file system: write
offset=460849152, buflen=131072
=========================================================================

In __blockdev_direct_IO, ocfs2_dio_wr_get_block is called to add
unwritten extents to a list. extents are also inserted into extent tree
in ocfs2_write_begin_nolock. Then another thread call fallocate to
puch a hole at one of the unwritten extent. The extent at cpos was
removed by ocfs2_remove_extent(). At end io worker thread,
ocfs2_search_extent_list found there is no such extent at the cpos.

    T1                        T2                T3
                              inode lock
                                ...
                                insert extents
                                ...
                              inode unlock
ocfs2_fallocate
 __ocfs2_change_file_space
  inode lock
  lock ip_alloc_sem
  ocfs2_remove_inode_range inode
   ocfs2_remove_btree_range
    ocfs2_remove_extent
    ^---remove the extent at cpos 78723
  ...
  unlock ip_alloc_sem
  inode unlock
                                       ocfs2_dio_end_io
                                        ocfs2_dio_end_io_write
                                         lock ip_alloc_sem
                                         ocfs2_mark_extent_written
                                          ocfs2_change_extent_flag
                                           ocfs2_search_extent_list
                                           ^---failed to find extent
                                          ...
                                          unlock ip_alloc_sem

In most filesystems, fallocate is not compatible with racing with
AIO+DIO, so fix it by adding to wait for all dio before
fallocate/punch_hole like ext4.

Signed-off-by: Su Yue <glass.su@suse.com>
---
 fs/ocfs2/file.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Joseph Qi April 1, 2024, 1:52 a.m. UTC | #1
On 3/31/24 7:17 PM, Su Yue wrote:
> From: Su Yue <glass.su@suse.com>
> 
> After commit "ocfs2: return real error code in ocfs2_dio_wr_get_block",
> fstests/generic/300 become from always failed to sometimes failed:
> 
> ========================================================================
> [  473.293420 ] run fstests generic/300
> 
> [  475.296983 ] JBD2: Ignoring recovery information on journal
> [  475.302473 ] ocfs2: Mounting device (253,1) on (node local, slot 0)
> with ordered data mode.
> [  494.290998 ] OCFS2: ERROR (device dm-1): ocfs2_change_extent_flag:
> Owner 5668 has an extent at cpos 78723 which can no longer be found
> [  494.291609 ] On-disk corruption discovered. Please run fsck.ocfs2
> once the filesystem is unmounted.
> [  494.292018 ] OCFS2: File system is now read-only.
> [  494.292224 ] (kworker/19:11,2628,19):ocfs2_mark_extent_written:5272
> ERROR: status = -30
> [  494.292602 ] (kworker/19:11,2628,19):ocfs2_dio_end_io_write:2374
> ERROR: status = -3
> fio: io_u error on file /mnt/scratch/racer: Read-only file system: write
> offset=460849152, buflen=131072
> =========================================================================
> 
> In __blockdev_direct_IO, ocfs2_dio_wr_get_block is called to add
> unwritten extents to a list. extents are also inserted into extent tree
> in ocfs2_write_begin_nolock. Then another thread call fallocate to
> puch a hole at one of the unwritten extent. The extent at cpos was
> removed by ocfs2_remove_extent(). At end io worker thread,
> ocfs2_search_extent_list found there is no such extent at the cpos.
> 
>     T1                        T2                T3
>                               inode lock
>                                 ...
>                                 insert extents
>                                 ...
>                               inode unlock
> ocfs2_fallocate
>  __ocfs2_change_file_space
>   inode lock
>   lock ip_alloc_sem
>   ocfs2_remove_inode_range inode
>    ocfs2_remove_btree_range
>     ocfs2_remove_extent
>     ^---remove the extent at cpos 78723
>   ...
>   unlock ip_alloc_sem
>   inode unlock
>                                        ocfs2_dio_end_io
>                                         ocfs2_dio_end_io_write
>                                          lock ip_alloc_sem
>                                          ocfs2_mark_extent_written
>                                           ocfs2_change_extent_flag
>                                            ocfs2_search_extent_list
>                                            ^---failed to find extent
>                                           ...
>                                           unlock ip_alloc_sem
> 
> In most filesystems, fallocate is not compatible with racing with
> AIO+DIO, so fix it by adding to wait for all dio before
> fallocate/punch_hole like ext4.
> 
> Signed-off-by: Su Yue <glass.su@suse.com>

Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>

> ---
>  fs/ocfs2/file.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 0da8e7bd3261..ccc57038a977 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1936,6 +1936,8 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
>  
>  	inode_lock(inode);
>  
> +	/* Wait all existing dio workers, newcomers will block on i_rwsem */
> +	inode_dio_wait(inode);
>  	/*
>  	 * This prevents concurrent writes on other nodes
>  	 */
diff mbox series

Patch

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 0da8e7bd3261..ccc57038a977 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1936,6 +1936,8 @@  static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
 
 	inode_lock(inode);
 
+	/* Wait all existing dio workers, newcomers will block on i_rwsem */
+	inode_dio_wait(inode);
 	/*
 	 * This prevents concurrent writes on other nodes
 	 */