Message ID | 20211021163330.1886516-1-bfoster@redhat.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | xfs: punch out data fork delalloc blocks on COW writeback failure | expand |
On Thu, Oct 21, 2021 at 12:33:30PM -0400, Brian Foster wrote: > If writeback I/O to a COW extent fails, the COW fork blocks are > punched out and the data fork blocks left alone. It is possible for > COW fork blocks to overlap non-shared data fork blocks (due to > cowextsz hint prealloc), however, and writeback unconditionally maps > to the COW fork whenever blocks exist at the corresponding offset of > the page undergoing writeback. This means it's quite possible for a > COW fork extent to overlap delalloc data fork blocks, writeback to > convert and map to the COW fork blocks, writeback to fail, and > finally for ioend completion to cancel the COW fork blocks and leave > stale data fork delalloc blocks around in the inode. The blocks are > effectively stale because writeback failure also discards dirty page > state. > > If this occurs, it is likely to trigger assert failures, free space > accounting corruption and failures in unrelated file operations. For > example, a subsequent reflink attempt of the affected file to a new > target file will trip over the stale delalloc in the source file and > fail. Several of these issues are occasionally reproduced by > generic/648, but are reproducible on demand with the right sequence > of operations and timely I/O error injection. > > To fix this problem, update the ioend failure path to also punch out > underlying data fork delalloc blocks on I/O error. This is analogous > to the writeback submission failure path in xfs_discard_page() where > we might fail to map data fork delalloc blocks and consistent with > the successful COW writeback completion path, which is responsible > for unmapping from the data fork and remapping in COW fork blocks. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > > Note that I have an fstest to reproduce this problem reliably that I'll > post to the list shortly. Yay! > Brian > > fs/xfs/xfs_aops.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 34fc6148032a..c8c15c3c3147 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -82,6 +82,7 @@ xfs_end_ioend( > struct iomap_ioend *ioend) > { > struct xfs_inode *ip = XFS_I(ioend->io_inode); > + struct xfs_mount *mp = ip->i_mount; > xfs_off_t offset = ioend->io_offset; > size_t size = ioend->io_size; > unsigned int nofs_flag; > @@ -97,18 +98,26 @@ xfs_end_ioend( > /* > * Just clean up the in-memory structures if the fs has been shut down. > */ > - if (xfs_is_shutdown(ip->i_mount)) { > + if (xfs_is_shutdown(mp)) { > error = -EIO; > goto done; > } > > /* > - * Clean up any COW blocks on an I/O error. > + * Clean up all COW blocks and underlying data fork delalloc blocks on > + * I/O error. The delalloc punch is required because this ioend was > + * mapped to blocks in the COW fork and the associated pages are no > + * longer dirty. If we don't remove delalloc blocks here, they become > + * stale and can corrupt free space accounting on unmount. > */ > error = blk_status_to_errno(ioend->io_bio->bi_status); > if (unlikely(error)) { > - if (ioend->io_flags & IOMAP_F_SHARED) > + if (ioend->io_flags & IOMAP_F_SHARED) { > xfs_reflink_cancel_cow_range(ip, offset, size, true); > + xfs_bmap_punch_delalloc_range(ip, > + XFS_B_TO_FSBT(mp, offset), > + XFS_B_TO_FSB(mp, size)); This looks correct to me. I'll give this a spin with your testcase and report back. Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + } > goto done; > } > > -- > 2.31.1 >
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 34fc6148032a..c8c15c3c3147 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -82,6 +82,7 @@ xfs_end_ioend( struct iomap_ioend *ioend) { struct xfs_inode *ip = XFS_I(ioend->io_inode); + struct xfs_mount *mp = ip->i_mount; xfs_off_t offset = ioend->io_offset; size_t size = ioend->io_size; unsigned int nofs_flag; @@ -97,18 +98,26 @@ xfs_end_ioend( /* * Just clean up the in-memory structures if the fs has been shut down. */ - if (xfs_is_shutdown(ip->i_mount)) { + if (xfs_is_shutdown(mp)) { error = -EIO; goto done; } /* - * Clean up any COW blocks on an I/O error. + * Clean up all COW blocks and underlying data fork delalloc blocks on + * I/O error. The delalloc punch is required because this ioend was + * mapped to blocks in the COW fork and the associated pages are no + * longer dirty. If we don't remove delalloc blocks here, they become + * stale and can corrupt free space accounting on unmount. */ error = blk_status_to_errno(ioend->io_bio->bi_status); if (unlikely(error)) { - if (ioend->io_flags & IOMAP_F_SHARED) + if (ioend->io_flags & IOMAP_F_SHARED) { xfs_reflink_cancel_cow_range(ip, offset, size, true); + xfs_bmap_punch_delalloc_range(ip, + XFS_B_TO_FSBT(mp, offset), + XFS_B_TO_FSB(mp, size)); + } goto done; }
If writeback I/O to a COW extent fails, the COW fork blocks are punched out and the data fork blocks left alone. It is possible for COW fork blocks to overlap non-shared data fork blocks (due to cowextsz hint prealloc), however, and writeback unconditionally maps to the COW fork whenever blocks exist at the corresponding offset of the page undergoing writeback. This means it's quite possible for a COW fork extent to overlap delalloc data fork blocks, writeback to convert and map to the COW fork blocks, writeback to fail, and finally for ioend completion to cancel the COW fork blocks and leave stale data fork delalloc blocks around in the inode. The blocks are effectively stale because writeback failure also discards dirty page state. If this occurs, it is likely to trigger assert failures, free space accounting corruption and failures in unrelated file operations. For example, a subsequent reflink attempt of the affected file to a new target file will trip over the stale delalloc in the source file and fail. Several of these issues are occasionally reproduced by generic/648, but are reproducible on demand with the right sequence of operations and timely I/O error injection. To fix this problem, update the ioend failure path to also punch out underlying data fork delalloc blocks on I/O error. This is analogous to the writeback submission failure path in xfs_discard_page() where we might fail to map data fork delalloc blocks and consistent with the successful COW writeback completion path, which is responsible for unmapping from the data fork and remapping in COW fork blocks. Signed-off-by: Brian Foster <bfoster@redhat.com> --- Note that I have an fstest to reproduce this problem reliably that I'll post to the list shortly. Brian fs/xfs/xfs_aops.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)