Message ID | b54f02c1204998a7dfa4e284af07f96365b99467.1691042474.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix an ASSERT() triggered inside prepare_to_merge() | expand |
On Thu, Aug 3, 2023 at 7:40 AM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > Syzbot reported a crash that an ASSERT() got triggered inside > prepare_to_merge(). > > [CAUSE] > The root cause of the triggered ASSERT() is we can have a race between > quota tree creation and relocation. > > This leads us to create a duplicated quota tree in the > btrfs_read_fs_root() path, and since it's treated as fs tree, it would > have ROOT_SHAREABLE flag, causing us to create a reloc tree for it. > > The bug itself is fixed by a dedicated patch for it, but this already > taught us the ASSERT() is not something straightforward for > developers. > > [ENHANCEMENT] > Instead of using an ASSERT(), let's handle it gracefully and output > extra info about the mismatch reloc roots to help debug. > > Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside > merge_reloc_roots() later. > Also replace those ASSERT(0)s with WARN_ON()s. > > Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/relocation.c | 44 +++++++++++++++++++++++++++++++++++-------- > 1 file changed, 36 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 9db2e6fa2cb2..28139a47c227 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -1916,7 +1916,38 @@ int prepare_to_merge(struct reloc_control *rc, int err) > err = PTR_ERR(root); > break; > } > - ASSERT(root->reloc_root == reloc_root); > + > + if (unlikely(root->reloc_root != reloc_root)) { > + if (root->reloc_root) > + btrfs_err(fs_info, > +"reloc tree mismatch, root %lld has reloc root key (%lld %u %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu", My comment about printing the key elements with commas still remains, there's still one key printed as "(%lld, %u, %llu)" while the other cases don't use the commas. Otherwise it looks good, thanks. > + root->root_key.objectid, > + root->reloc_root->root_key.objectid, > + root->reloc_root->root_key.type, > + root->reloc_root->root_key.offset, > + btrfs_root_generation( > + &root->reloc_root->root_item), > + reloc_root->root_key.objectid, > + reloc_root->root_key.type, > + reloc_root->root_key.offset, > + btrfs_root_generation( > + &reloc_root->root_item)); > + else > + btrfs_err(fs_info, > +"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld %u %llu) gen %llu", > + root->root_key.objectid, > + reloc_root->root_key.objectid, > + reloc_root->root_key.type, > + reloc_root->root_key.offset, > + btrfs_root_generation( > + &reloc_root->root_item)); > + list_add(&reloc_root->root_list, &reloc_roots); > + btrfs_put_root(root); > + btrfs_abort_transaction(trans, -EUCLEAN); > + if (!err) > + err = -EUCLEAN; > + break; > + } > > /* > * set reference count to 1, so btrfs_recover_relocation > @@ -1989,7 +2020,7 @@ void merge_reloc_roots(struct reloc_control *rc) > root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset, > false); > if (btrfs_root_refs(&reloc_root->root_item) > 0) { > - if (IS_ERR(root)) { > + if (WARN_ON(IS_ERR(root))) { > /* > * For recovery we read the fs roots on mount, > * and if we didn't find the root then we marked > @@ -1998,17 +2029,14 @@ void merge_reloc_roots(struct reloc_control *rc) > * memory. However there's no reason we can't > * handle the error properly here just in case. > */ > - ASSERT(0); > ret = PTR_ERR(root); > goto out; > } > - if (root->reloc_root != reloc_root) { > + if (WARN_ON(root->reloc_root != reloc_root)) { > /* > - * This is actually impossible without something > - * going really wrong (like weird race condition > - * or cosmic rays). > + * This can happen if on-disk metadata has some > + * corruption, e.g. bad reloc tree key offset. > */ > - ASSERT(0); > ret = -EINVAL; > goto out; > } > -- > 2.41.0 >
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 9db2e6fa2cb2..28139a47c227 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1916,7 +1916,38 @@ int prepare_to_merge(struct reloc_control *rc, int err) err = PTR_ERR(root); break; } - ASSERT(root->reloc_root == reloc_root); + + if (unlikely(root->reloc_root != reloc_root)) { + if (root->reloc_root) + btrfs_err(fs_info, +"reloc tree mismatch, root %lld has reloc root key (%lld %u %llu) gen %llu, expect reloc root key (%lld, %u, %llu) gen %llu", + root->root_key.objectid, + root->reloc_root->root_key.objectid, + root->reloc_root->root_key.type, + root->reloc_root->root_key.offset, + btrfs_root_generation( + &root->reloc_root->root_item), + reloc_root->root_key.objectid, + reloc_root->root_key.type, + reloc_root->root_key.offset, + btrfs_root_generation( + &reloc_root->root_item)); + else + btrfs_err(fs_info, +"reloc tree mismatch, root %lld has no reloc root, expect reloc root key (%lld %u %llu) gen %llu", + root->root_key.objectid, + reloc_root->root_key.objectid, + reloc_root->root_key.type, + reloc_root->root_key.offset, + btrfs_root_generation( + &reloc_root->root_item)); + list_add(&reloc_root->root_list, &reloc_roots); + btrfs_put_root(root); + btrfs_abort_transaction(trans, -EUCLEAN); + if (!err) + err = -EUCLEAN; + break; + } /* * set reference count to 1, so btrfs_recover_relocation @@ -1989,7 +2020,7 @@ void merge_reloc_roots(struct reloc_control *rc) root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset, false); if (btrfs_root_refs(&reloc_root->root_item) > 0) { - if (IS_ERR(root)) { + if (WARN_ON(IS_ERR(root))) { /* * For recovery we read the fs roots on mount, * and if we didn't find the root then we marked @@ -1998,17 +2029,14 @@ void merge_reloc_roots(struct reloc_control *rc) * memory. However there's no reason we can't * handle the error properly here just in case. */ - ASSERT(0); ret = PTR_ERR(root); goto out; } - if (root->reloc_root != reloc_root) { + if (WARN_ON(root->reloc_root != reloc_root)) { /* - * This is actually impossible without something - * going really wrong (like weird race condition - * or cosmic rays). + * This can happen if on-disk metadata has some + * corruption, e.g. bad reloc tree key offset. */ - ASSERT(0); ret = -EINVAL; goto out; }
[BUG] Syzbot reported a crash that an ASSERT() got triggered inside prepare_to_merge(). [CAUSE] The root cause of the triggered ASSERT() is we can have a race between quota tree creation and relocation. This leads us to create a duplicated quota tree in the btrfs_read_fs_root() path, and since it's treated as fs tree, it would have ROOT_SHAREABLE flag, causing us to create a reloc tree for it. The bug itself is fixed by a dedicated patch for it, but this already taught us the ASSERT() is not something straightforward for developers. [ENHANCEMENT] Instead of using an ASSERT(), let's handle it gracefully and output extra info about the mismatch reloc roots to help debug. Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside merge_reloc_roots() later. Also replace those ASSERT(0)s with WARN_ON()s. Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/relocation.c | 44 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 36 insertions(+), 8 deletions(-)