diff mbox series

btrfs: don't drop extent_map for free space inode on write error

Message ID 280a4b489f58586b20bd195c84cf396098a784d5.1706729623.git.josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series btrfs: don't drop extent_map for free space inode on write error | expand

Commit Message

Josef Bacik Jan. 31, 2024, 7:33 p.m. UTC
While running the CI for an unrelated change I hit the following panic
with generic/648 on btrfs_holes_spacecache.

assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1385
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:1385!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 2695096 Comm: fsstress Kdump: loaded Tainted: G        W          6.8.0-rc2+ #1
RIP: 0010:__extent_writepage_io.constprop.0+0x4c1/0x5c0
Call Trace:
 <TASK>
 extent_write_cache_pages+0x2ac/0x8f0
 extent_writepages+0x87/0x110
 do_writepages+0xd5/0x1f0
 filemap_fdatawrite_wbc+0x63/0x90
 __filemap_fdatawrite_range+0x5c/0x80
 btrfs_fdatawrite_range+0x1f/0x50
 btrfs_write_out_cache+0x507/0x560
 btrfs_write_dirty_block_groups+0x32a/0x420
 commit_cowonly_roots+0x21b/0x290
 btrfs_commit_transaction+0x813/0x1360
 btrfs_sync_file+0x51a/0x640
 __x64_sys_fdatasync+0x52/0x90
 do_syscall_64+0x9c/0x190
 entry_SYSCALL_64_after_hwframe+0x6e/0x76

This happens because we fail to write out the free space cache in one
instance, come back around and attempt to write it again.  However on
the second pass through we go to call btrfs_get_extent() on the inode to
get the extent mapping.  Because this is a new block group, and with the
free space inode we always search the commit root to avoid deadlocking
with the tree, we find nothing and return a EXTENT_MAP_HOLE for the
requested range.

This happens because the first time we try to write the space cache out
we hit an error, and on an error we drop the extent mapping.  This is
normal for normal files, but the free space cache inode is special.  We
always expect the extent map to be correct.  Thus the second time
through we end up with a bogus extent map.

Since we're deprecating this feature, the most straightforward way to
fix this is to simply skip dropping the extent map range for this failed
range.

I shortened the test by using error injection to stress the area to make
it easier to reproduce.  With this patch in place we no longer panic
with my error injection test.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Filipe Manana Feb. 1, 2024, 11:39 a.m. UTC | #1
On Wed, Jan 31, 2024 at 7:34 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> While running the CI for an unrelated change I hit the following panic
> with generic/648 on btrfs_holes_spacecache.
>
> assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1385
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent_io.c:1385!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 1 PID: 2695096 Comm: fsstress Kdump: loaded Tainted: G        W          6.8.0-rc2+ #1
> RIP: 0010:__extent_writepage_io.constprop.0+0x4c1/0x5c0
> Call Trace:
>  <TASK>
>  extent_write_cache_pages+0x2ac/0x8f0
>  extent_writepages+0x87/0x110
>  do_writepages+0xd5/0x1f0
>  filemap_fdatawrite_wbc+0x63/0x90
>  __filemap_fdatawrite_range+0x5c/0x80
>  btrfs_fdatawrite_range+0x1f/0x50
>  btrfs_write_out_cache+0x507/0x560
>  btrfs_write_dirty_block_groups+0x32a/0x420
>  commit_cowonly_roots+0x21b/0x290
>  btrfs_commit_transaction+0x813/0x1360
>  btrfs_sync_file+0x51a/0x640
>  __x64_sys_fdatasync+0x52/0x90
>  do_syscall_64+0x9c/0x190
>  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
> This happens because we fail to write out the free space cache in one
> instance, come back around and attempt to write it again.  However on
> the second pass through we go to call btrfs_get_extent() on the inode to
> get the extent mapping.  Because this is a new block group, and with the
> free space inode we always search the commit root to avoid deadlocking
> with the tree, we find nothing and return a EXTENT_MAP_HOLE for the
> requested range.
>
> This happens because the first time we try to write the space cache out
> we hit an error, and on an error we drop the extent mapping.  This is
> normal for normal files, but the free space cache inode is special.  We
> always expect the extent map to be correct.  Thus the second time
> through we end up with a bogus extent map.
>
> Since we're deprecating this feature, the most straightforward way to
> fix this is to simply skip dropping the extent map range for this failed
> range.
>
> I shortened the test by using error injection to stress the area to make
> it easier to reproduce.  With this patch in place we no longer panic
> with my error injection test.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/inode.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 6734717350e3..ca26e2c267f2 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3182,8 +3182,23 @@ int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent)
>                         unwritten_start += logical_len;
>                 clear_extent_uptodate(io_tree, unwritten_start, end, NULL);
>
> -               /* Drop extent maps for the part of the extent we didn't write. */
> -               btrfs_drop_extent_map_range(inode, unwritten_start, end, false);
> +               /*
> +                * Drop extent maps for the part of the extent we didn't write.
> +                *
> +                * We have an exception here for the free_space_inode, this is
> +                * because when we do btrfs_get_extent() on the free space inode
> +                * we will search the commit root.  If this is a new block group
> +                * we won't find anything, and we will trip over the assert in
> +                * writepage where we do ASSERT(em->block_start !=
> +                * EXTENT_MAP_HOLE).
> +                *
> +                * Theoretically we could also skip this for any NOCOW extent as
> +                * we don't mess with the extent map tree in the NOCOW case, but
> +                * for now simply skip this if we are the free space inode.
> +                */
> +               if (!btrfs_is_free_space_inode(inode))
> +                       btrfs_drop_extent_map_range(inode, unwritten_start,
> +                                                   end, false);
>
>                 /*
>                  * If the ordered extent had an IOERR or something else went
> --
> 2.43.0
>
>
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6734717350e3..ca26e2c267f2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3182,8 +3182,23 @@  int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent)
 			unwritten_start += logical_len;
 		clear_extent_uptodate(io_tree, unwritten_start, end, NULL);
 
-		/* Drop extent maps for the part of the extent we didn't write. */
-		btrfs_drop_extent_map_range(inode, unwritten_start, end, false);
+		/*
+		 * Drop extent maps for the part of the extent we didn't write.
+		 *
+		 * We have an exception here for the free_space_inode, this is
+		 * because when we do btrfs_get_extent() on the free space inode
+		 * we will search the commit root.  If this is a new block group
+		 * we won't find anything, and we will trip over the assert in
+		 * writepage where we do ASSERT(em->block_start !=
+		 * EXTENT_MAP_HOLE).
+		 *
+		 * Theoretically we could also skip this for any NOCOW extent as
+		 * we don't mess with the extent map tree in the NOCOW case, but
+		 * for now simply skip this if we are the free space inode.
+		 */
+		if (!btrfs_is_free_space_inode(inode))
+			btrfs_drop_extent_map_range(inode, unwritten_start,
+						    end, false);
 
 		/*
 		 * If the ordered extent had an IOERR or something else went