Message ID | 33e75a646274b3c844744dfec54c46ae89aa3d33.1690970028.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix an ASSERT() triggered inside prepare_to_merge() | expand |
On Wed, Aug 2, 2023 at 11:25 AM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > Syzbot reported a crash that an ASSERT() got triggered inside > prepare_to_merge(). > > [CAUSE] > The root cause of the triggered ASSERT() is we can have a race between > quota tree creation and relocation. > > This leads us to create a duplicated quota tree in the > btrfs_read_fs_root() path, and since it's treated as fs tree, it would > have ROOT_SHAREABLE flag, causing us to create a reloc tree for it. > > The bug itself is fixed by a dedicated patch for it, but this already > taught us the ASSERT() is not something straightforward for > developers. > > [ENHANCEMENT] > Instead of using an ASSERT(), let's handle it gracefully and output > extra info about the mismatch reloc roots to help debug. > > Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/relocation.c | 40 ++++++++++++++++++++++++++++++++++------ > 1 file changed, 34 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 9db2e6fa2cb2..32a8bdc08488 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -1916,7 +1916,38 @@ int prepare_to_merge(struct reloc_control *rc, int err) > err = PTR_ERR(root); > break; > } > - ASSERT(root->reloc_root == reloc_root); > + > + if (unlikely(root->reloc_root != reloc_root)) { > + if (root->reloc_root) > + btrfs_err(fs_info, > +"reloc tree mismatch, root %lld has reloc root key (%lld, %u, %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu", Please remove the commas when printing keys, use (%llu %u %llu). This is the style we follow (tree checker, ctree.c, etc). > + root->root_key.objectid, > + root->reloc_root->root_key.objectid, > + root->reloc_root->root_key.type, > + root->reloc_root->root_key.offset, > + btrfs_root_generation( > + &root->reloc_root->root_item), > + reloc_root->root_key.objectid, > + reloc_root->root_key.type, > + reloc_root->root_key.offset, > + btrfs_root_generation( > + &reloc_root->root_item)); > + else > + btrfs_err(fs_info, > +"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld, %u, %llu) gen %llu", Same here. > + root->root_key.objectid, > + reloc_root->root_key.objectid, > + reloc_root->root_key.type, > + reloc_root->root_key.offset, > + btrfs_root_generation( > + &reloc_root->root_item)); > + list_add(&reloc_root->root_list, &reloc_roots); > + btrfs_put_root(root); > + btrfs_abort_transaction(trans, -EUCLEAN); > + if (!err) > + err = -EUCLEAN; > + break; > + } > > /* > * set reference count to 1, so btrfs_recover_relocation > @@ -1998,17 +2029,14 @@ void merge_reloc_roots(struct reloc_control *rc) > * memory. However there's no reason we can't > * handle the error properly here just in case. > */ > - ASSERT(0); > ret = PTR_ERR(root); > goto out; > } > if (root->reloc_root != reloc_root) { As the ASSERT below is gone, maybe adding a WARN_ON(root->reloc_root != reloc_root) is helpful too, so that if this ever happens and relocation fails with -EINVAL, it will point to this location. > /* > - * This is actually impossible without something > - * going really wrong (like weird race condition > - * or cosmic rays). > + * This can happen if on-disk data has some data -> metadata Other than that it looks fine to me, thanks. > + * corruption, e.g. bad reloc tree key offset. > */ > - ASSERT(0); > ret = -EINVAL; > goto out; > } > -- > 2.41.0 >
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 9db2e6fa2cb2..32a8bdc08488 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1916,7 +1916,38 @@ int prepare_to_merge(struct reloc_control *rc, int err) err = PTR_ERR(root); break; } - ASSERT(root->reloc_root == reloc_root); + + if (unlikely(root->reloc_root != reloc_root)) { + if (root->reloc_root) + btrfs_err(fs_info, +"reloc tree mismatch, root %lld has reloc root key (%lld, %u, %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu", + root->root_key.objectid, + root->reloc_root->root_key.objectid, + root->reloc_root->root_key.type, + root->reloc_root->root_key.offset, + btrfs_root_generation( + &root->reloc_root->root_item), + reloc_root->root_key.objectid, + reloc_root->root_key.type, + reloc_root->root_key.offset, + btrfs_root_generation( + &reloc_root->root_item)); + else + btrfs_err(fs_info, +"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld, %u, %llu) gen %llu", + root->root_key.objectid, + reloc_root->root_key.objectid, + reloc_root->root_key.type, + reloc_root->root_key.offset, + btrfs_root_generation( + &reloc_root->root_item)); + list_add(&reloc_root->root_list, &reloc_roots); + btrfs_put_root(root); + btrfs_abort_transaction(trans, -EUCLEAN); + if (!err) + err = -EUCLEAN; + break; + } /* * set reference count to 1, so btrfs_recover_relocation @@ -1998,17 +2029,14 @@ void merge_reloc_roots(struct reloc_control *rc) * memory. However there's no reason we can't * handle the error properly here just in case. */ - ASSERT(0); ret = PTR_ERR(root); goto out; } if (root->reloc_root != reloc_root) { /* - * This is actually impossible without something - * going really wrong (like weird race condition - * or cosmic rays). + * This can happen if on-disk data has some + * corruption, e.g. bad reloc tree key offset. */ - ASSERT(0); ret = -EINVAL; goto out; }
[BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge(). [CAUSE] The root cause of the triggered ASSERT() is we can have a race between quota tree creation and relocation. This leads us to create a duplicated quota tree in the btrfs_read_fs_root() path, and since it's treated as fs tree, it would have ROOT_SHAREABLE flag, causing us to create a reloc tree for it. The bug itself is fixed by a dedicated patch for it, but this already taught us the ASSERT() is not something straightforward for developers. [ENHANCEMENT] Instead of using an ASSERT(), let's handle it gracefully and output extra info about the mismatch reloc roots to help debug. Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/relocation.c | 40 ++++++++++++++++++++++++++++++++++------ 1 file changed, 34 insertions(+), 6 deletions(-)