diff mbox series

xfs: punch out data fork delalloc blocks on COW writeback failure

Message ID 20211021163330.1886516-1-bfoster@redhat.com (mailing list archive)
State Accepted, archived
Headers show
Series xfs: punch out data fork delalloc blocks on COW writeback failure | expand

Commit Message

Brian Foster Oct. 21, 2021, 4:33 p.m. UTC
If writeback I/O to a COW extent fails, the COW fork blocks are
punched out and the data fork blocks left alone. It is possible for
COW fork blocks to overlap non-shared data fork blocks (due to
cowextsz hint prealloc), however, and writeback unconditionally maps
to the COW fork whenever blocks exist at the corresponding offset of
the page undergoing writeback. This means it's quite possible for a
COW fork extent to overlap delalloc data fork blocks, writeback to
convert and map to the COW fork blocks, writeback to fail, and
finally for ioend completion to cancel the COW fork blocks and leave
stale data fork delalloc blocks around in the inode. The blocks are
effectively stale because writeback failure also discards dirty page
state.

If this occurs, it is likely to trigger assert failures, free space
accounting corruption and failures in unrelated file operations. For
example, a subsequent reflink attempt of the affected file to a new
target file will trip over the stale delalloc in the source file and
fail. Several of these issues are occasionally reproduced by
generic/648, but are reproducible on demand with the right sequence
of operations and timely I/O error injection.

To fix this problem, update the ioend failure path to also punch out
underlying data fork delalloc blocks on I/O error. This is analogous
to the writeback submission failure path in xfs_discard_page() where
we might fail to map data fork delalloc blocks and consistent with
the successful COW writeback completion path, which is responsible
for unmapping from the data fork and remapping in COW fork blocks.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

Note that I have an fstest to reproduce this problem reliably that I'll
post to the list shortly.

Brian

 fs/xfs/xfs_aops.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Darrick J. Wong Oct. 21, 2021, 6:42 p.m. UTC | #1
On Thu, Oct 21, 2021 at 12:33:30PM -0400, Brian Foster wrote:
> If writeback I/O to a COW extent fails, the COW fork blocks are
> punched out and the data fork blocks left alone. It is possible for
> COW fork blocks to overlap non-shared data fork blocks (due to
> cowextsz hint prealloc), however, and writeback unconditionally maps
> to the COW fork whenever blocks exist at the corresponding offset of
> the page undergoing writeback. This means it's quite possible for a
> COW fork extent to overlap delalloc data fork blocks, writeback to
> convert and map to the COW fork blocks, writeback to fail, and
> finally for ioend completion to cancel the COW fork blocks and leave
> stale data fork delalloc blocks around in the inode. The blocks are
> effectively stale because writeback failure also discards dirty page
> state.
> 
> If this occurs, it is likely to trigger assert failures, free space
> accounting corruption and failures in unrelated file operations. For
> example, a subsequent reflink attempt of the affected file to a new
> target file will trip over the stale delalloc in the source file and
> fail. Several of these issues are occasionally reproduced by
> generic/648, but are reproducible on demand with the right sequence
> of operations and timely I/O error injection.
> 
> To fix this problem, update the ioend failure path to also punch out
> underlying data fork delalloc blocks on I/O error. This is analogous
> to the writeback submission failure path in xfs_discard_page() where
> we might fail to map data fork delalloc blocks and consistent with
> the successful COW writeback completion path, which is responsible
> for unmapping from the data fork and remapping in COW fork blocks.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> Note that I have an fstest to reproduce this problem reliably that I'll
> post to the list shortly.

Yay!

> Brian
> 
>  fs/xfs/xfs_aops.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 34fc6148032a..c8c15c3c3147 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -82,6 +82,7 @@ xfs_end_ioend(
>  	struct iomap_ioend	*ioend)
>  {
>  	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
> +	struct xfs_mount	*mp = ip->i_mount;
>  	xfs_off_t		offset = ioend->io_offset;
>  	size_t			size = ioend->io_size;
>  	unsigned int		nofs_flag;
> @@ -97,18 +98,26 @@ xfs_end_ioend(
>  	/*
>  	 * Just clean up the in-memory structures if the fs has been shut down.
>  	 */
> -	if (xfs_is_shutdown(ip->i_mount)) {
> +	if (xfs_is_shutdown(mp)) {
>  		error = -EIO;
>  		goto done;
>  	}
>  
>  	/*
> -	 * Clean up any COW blocks on an I/O error.
> +	 * Clean up all COW blocks and underlying data fork delalloc blocks on
> +	 * I/O error. The delalloc punch is required because this ioend was
> +	 * mapped to blocks in the COW fork and the associated pages are no
> +	 * longer dirty. If we don't remove delalloc blocks here, they become
> +	 * stale and can corrupt free space accounting on unmount.
>  	 */
>  	error = blk_status_to_errno(ioend->io_bio->bi_status);
>  	if (unlikely(error)) {
> -		if (ioend->io_flags & IOMAP_F_SHARED)
> +		if (ioend->io_flags & IOMAP_F_SHARED) {
>  			xfs_reflink_cancel_cow_range(ip, offset, size, true);
> +			xfs_bmap_punch_delalloc_range(ip,
> +						      XFS_B_TO_FSBT(mp, offset),
> +						      XFS_B_TO_FSB(mp, size));

This looks correct to me.  I'll give this a spin with your testcase and
report back.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +		}
>  		goto done;
>  	}
>  
> -- 
> 2.31.1
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 34fc6148032a..c8c15c3c3147 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -82,6 +82,7 @@  xfs_end_ioend(
 	struct iomap_ioend	*ioend)
 {
 	struct xfs_inode	*ip = XFS_I(ioend->io_inode);
+	struct xfs_mount	*mp = ip->i_mount;
 	xfs_off_t		offset = ioend->io_offset;
 	size_t			size = ioend->io_size;
 	unsigned int		nofs_flag;
@@ -97,18 +98,26 @@  xfs_end_ioend(
 	/*
 	 * Just clean up the in-memory structures if the fs has been shut down.
 	 */
-	if (xfs_is_shutdown(ip->i_mount)) {
+	if (xfs_is_shutdown(mp)) {
 		error = -EIO;
 		goto done;
 	}
 
 	/*
-	 * Clean up any COW blocks on an I/O error.
+	 * Clean up all COW blocks and underlying data fork delalloc blocks on
+	 * I/O error. The delalloc punch is required because this ioend was
+	 * mapped to blocks in the COW fork and the associated pages are no
+	 * longer dirty. If we don't remove delalloc blocks here, they become
+	 * stale and can corrupt free space accounting on unmount.
 	 */
 	error = blk_status_to_errno(ioend->io_bio->bi_status);
 	if (unlikely(error)) {
-		if (ioend->io_flags & IOMAP_F_SHARED)
+		if (ioend->io_flags & IOMAP_F_SHARED) {
 			xfs_reflink_cancel_cow_range(ip, offset, size, true);
+			xfs_bmap_punch_delalloc_range(ip,
+						      XFS_B_TO_FSBT(mp, offset),
+						      XFS_B_TO_FSB(mp, size));
+		}
 		goto done;
 	}