diff mbox series

[v2,2/3] btrfs: exit gracefully if reloc roots don't match

Message ID 33e75a646274b3c844744dfec54c46ae89aa3d33.1690970028.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix an ASSERT() triggered inside prepare_to_merge() | expand

Commit Message

Qu Wenruo Aug. 2, 2023, 10:02 a.m. UTC
[BUG]
Syzbot reported a crash that an ASSERT() got triggered inside
prepare_to_merge().

[CAUSE]
The root cause of the triggered ASSERT() is we can have a race between
quota tree creation and relocation.

This leads us to create a duplicated quota tree in the
btrfs_read_fs_root() path, and since it's treated as fs tree, it would
have ROOT_SHAREABLE flag, causing us to create a reloc tree for it.

The bug itself is fixed by a dedicated patch for it, but this already
taught us the ASSERT() is not something straightforward for
developers.

[ENHANCEMENT]
Instead of using an ASSERT(), let's handle it gracefully and output
extra info about the mismatch reloc roots to help debug.

Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 40 ++++++++++++++++++++++++++++++++++------
 1 file changed, 34 insertions(+), 6 deletions(-)

Comments

Filipe Manana Aug. 2, 2023, 12:05 p.m. UTC | #1
On Wed, Aug 2, 2023 at 11:25 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> Syzbot reported a crash that an ASSERT() got triggered inside
> prepare_to_merge().
>
> [CAUSE]
> The root cause of the triggered ASSERT() is we can have a race between
> quota tree creation and relocation.
>
> This leads us to create a duplicated quota tree in the
> btrfs_read_fs_root() path, and since it's treated as fs tree, it would
> have ROOT_SHAREABLE flag, causing us to create a reloc tree for it.
>
> The bug itself is fixed by a dedicated patch for it, but this already
> taught us the ASSERT() is not something straightforward for
> developers.
>
> [ENHANCEMENT]
> Instead of using an ASSERT(), let's handle it gracefully and output
> extra info about the mismatch reloc roots to help debug.
>
> Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/relocation.c | 40 ++++++++++++++++++++++++++++++++++------
>  1 file changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 9db2e6fa2cb2..32a8bdc08488 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -1916,7 +1916,38 @@ int prepare_to_merge(struct reloc_control *rc, int err)
>                                 err = PTR_ERR(root);
>                         break;
>                 }
> -               ASSERT(root->reloc_root == reloc_root);
> +
> +               if (unlikely(root->reloc_root != reloc_root)) {
> +                       if (root->reloc_root)
> +                               btrfs_err(fs_info,
> +"reloc tree mismatch, root %lld has reloc root key (%lld, %u, %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu",

Please remove the commas when printing keys, use (%llu %u %llu).
This is the style we follow (tree checker, ctree.c, etc).

> +                                         root->root_key.objectid,
> +                                         root->reloc_root->root_key.objectid,
> +                                         root->reloc_root->root_key.type,
> +                                         root->reloc_root->root_key.offset,
> +                                         btrfs_root_generation(
> +                                                 &root->reloc_root->root_item),
> +                                         reloc_root->root_key.objectid,
> +                                         reloc_root->root_key.type,
> +                                         reloc_root->root_key.offset,
> +                                         btrfs_root_generation(
> +                                                 &reloc_root->root_item));
> +                       else
> +                               btrfs_err(fs_info,
> +"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld, %u, %llu) gen %llu",

Same here.

> +                                         root->root_key.objectid,
> +                                         reloc_root->root_key.objectid,
> +                                         reloc_root->root_key.type,
> +                                         reloc_root->root_key.offset,
> +                                         btrfs_root_generation(
> +                                                 &reloc_root->root_item));
> +                       list_add(&reloc_root->root_list, &reloc_roots);
> +                       btrfs_put_root(root);
> +                       btrfs_abort_transaction(trans, -EUCLEAN);
> +                       if (!err)
> +                               err = -EUCLEAN;
> +                       break;
> +               }
>
>                 /*
>                  * set reference count to 1, so btrfs_recover_relocation
> @@ -1998,17 +2029,14 @@ void merge_reloc_roots(struct reloc_control *rc)
>                                  * memory.  However there's no reason we can't
>                                  * handle the error properly here just in case.
>                                  */
> -                               ASSERT(0);
>                                 ret = PTR_ERR(root);
>                                 goto out;
>                         }
>                         if (root->reloc_root != reloc_root) {

As the ASSERT below is gone, maybe adding a WARN_ON(root->reloc_root
!= reloc_root) is helpful too, so that if this ever happens and
relocation fails with -EINVAL, it will point to this location.

>                                 /*
> -                                * This is actually impossible without something
> -                                * going really wrong (like weird race condition
> -                                * or cosmic rays).
> +                                * This can happen if on-disk data has some

data -> metadata

Other than that it looks fine to me, thanks.

> +                                * corruption, e.g. bad reloc tree key offset.
>                                  */
> -                               ASSERT(0);
>                                 ret = -EINVAL;
>                                 goto out;
>                         }
> --
> 2.41.0
>
diff mbox series

Patch

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 9db2e6fa2cb2..32a8bdc08488 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1916,7 +1916,38 @@  int prepare_to_merge(struct reloc_control *rc, int err)
 				err = PTR_ERR(root);
 			break;
 		}
-		ASSERT(root->reloc_root == reloc_root);
+
+		if (unlikely(root->reloc_root != reloc_root)) {
+			if (root->reloc_root)
+				btrfs_err(fs_info,
+"reloc tree mismatch, root %lld has reloc root key (%lld, %u, %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu",
+					  root->root_key.objectid,
+					  root->reloc_root->root_key.objectid,
+					  root->reloc_root->root_key.type,
+					  root->reloc_root->root_key.offset,
+					  btrfs_root_generation(
+						  &root->reloc_root->root_item),
+					  reloc_root->root_key.objectid,
+					  reloc_root->root_key.type,
+					  reloc_root->root_key.offset,
+					  btrfs_root_generation(
+						  &reloc_root->root_item));
+			else
+				btrfs_err(fs_info,
+"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld, %u, %llu) gen %llu",
+					  root->root_key.objectid,
+					  reloc_root->root_key.objectid,
+					  reloc_root->root_key.type,
+					  reloc_root->root_key.offset,
+					  btrfs_root_generation(
+						  &reloc_root->root_item));
+			list_add(&reloc_root->root_list, &reloc_roots);
+			btrfs_put_root(root);
+			btrfs_abort_transaction(trans, -EUCLEAN);
+			if (!err)
+				err = -EUCLEAN;
+			break;
+		}
 
 		/*
 		 * set reference count to 1, so btrfs_recover_relocation
@@ -1998,17 +2029,14 @@  void merge_reloc_roots(struct reloc_control *rc)
 				 * memory.  However there's no reason we can't
 				 * handle the error properly here just in case.
 				 */
-				ASSERT(0);
 				ret = PTR_ERR(root);
 				goto out;
 			}
 			if (root->reloc_root != reloc_root) {
 				/*
-				 * This is actually impossible without something
-				 * going really wrong (like weird race condition
-				 * or cosmic rays).
+				 * This can happen if on-disk data has some
+				 * corruption, e.g. bad reloc tree key offset.
 				 */
-				ASSERT(0);
 				ret = -EINVAL;
 				goto out;
 			}