Message ID | f72bd0cd2198d017d31f7f797546944b2afdd4ab.1690970028.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix an ASSERT() triggered inside prepare_to_merge() | expand |
On Wed, Aug 2, 2023 at 12:20 PM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > Syzbot reported a weird ASSERT() triggered inside prepare_to_merge(). > > With extra debug output, it shows we're trying to relocate tree blocks > for quota root: > > BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17 > ------------[ cut here ]------------ > BTRFS: Transaction aborted (error -117) So this message doesn't exist before the 2nd patch in series, and neither the transaction abort. What we have is an assert. I would suggest pasting here the assertion failure and stack trace reported at: https://lore.kernel.org/linux-btrfs/000000000000a3d67705ff730522@google.com/ So: assertion failed: root->reloc_root == reloc_root, in fs/btrfs/relocation.c:1919 ------------[ cut here ]------------ kernel BUG at fs/btrfs/relocation.c:1919! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 9904 Comm: syz-executor.3 Not tainted 6.4.0-syzkaller-08881-g533925cb7604 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 RIP: 0010:prepare_to_merge+0xbb2/0xc40 fs/btrfs/relocation.c:1919 Code: fe e9 f5 (...) RSP: 0018:ffffc9000325f760 EFLAGS: 00010246 RAX: 000000000000004f RBX: ffff888075644030 RCX: 1481ccc522da5800 RDX: ffffc90005c09000 RSI: 00000000000364ca RDI: 00000000000364cb RBP: ffffc9000325f870 R08: ffffffff816f33ac R09: 1ffff9200064bea0 R10: dffffc0000000000 R11: fffff5200064bea1 R12: ffff888075644000 R13: ffff88803b166000 R14: ffff88803b166560 R15: ffff88803b166558 FS: 00007f4e305fd700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056080679c000 CR3: 00000000193ad000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> relocate_block_group+0xa5d/0xcd0 fs/btrfs/relocation.c:3749 btrfs_relocate_block_group+0x7ab/0xd70 fs/btrfs/relocation.c:4087 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3283 __btrfs_balance+0x1b06/0x2690 fs/btrfs/volumes.c:4018 btrfs_balance+0xbdb/0x1120 fs/btrfs/volumes.c:4402 btrfs_ioctl_balance+0x496/0x7c0 fs/btrfs/ioctl.c:3604 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl+0xf8/0x170 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f4e2f88c389 This is what we normally do anyway. > > [CAUSE] > Normally we should not use reloc tree for quota root at all, as reloc > trees are only for subvolume trees. > > But there is a race between quota enabling and relocation, this happens > after commit 85724171b302 ("btrfs: fix the btrfs_get_global_root return value"). > > Before that commit, for quota and free space tree, we exit immediately > if we can not grab it from fs_info. > > But now we would try to read it from disk, just as if they are fs trees, > this sets ROOT_SHAREABLE flags in such race: > > Thread A | Thread B > ---------------------------------+------------------------------ > btrfs_quota_enable() | > | | btrfs_get_root_ref() > | | |- btrfs_get_global_root() > | | | Returned NULL > | | |- btrfs_lookup_fs_root() > | | | Returned NULL > |- btrfs_create_tree() | | > | Now quota root item is | | > | inserted | |- btrfs_read_tree_root() > | | | Got the newly inserted quota root > | | |- btrfs_init_fs_root() > | | | Set ROOT_SHAREABLE flag > > [FIX] > Get back to the old behavior by returning PTR_ERR(-ENOENT) if the target > objectid is not a subvolume tree or data reloc tree. > > Fixes: 85724171b302 ("btrfs: fix the btrfs_get_global_root return value") > Reported-and-tested-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com > Signed-off-by: Qu Wenruo <wqu@suse.com> Other than that, it looks good to me: Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > --- > fs/btrfs/disk-io.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index da51e5750443..5fd336c597e9 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -1300,6 +1300,16 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info, > root = btrfs_get_global_root(fs_info, objectid); > if (root) > return root; > + > + /* > + * If we're called for non-subvolume trees, and above function didn't > + * found one, do not try to read it from disk. > + * > + * This is mostly for fst and quota trees, which can change at runtime > + * and should only be grabbed from fs_info. > + */ > + if (!is_fstree(objectid) && objectid != BTRFS_DATA_RELOC_TREE_OBJECTID) > + return ERR_PTR(-ENOENT); > again: > root = btrfs_lookup_fs_root(fs_info, objectid); > if (root) { > -- > 2.41.0 >
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index da51e5750443..5fd336c597e9 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1300,6 +1300,16 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info, root = btrfs_get_global_root(fs_info, objectid); if (root) return root; + + /* + * If we're called for non-subvolume trees, and above function didn't + * found one, do not try to read it from disk. + * + * This is mostly for fst and quota trees, which can change at runtime + * and should only be grabbed from fs_info. + */ + if (!is_fstree(objectid) && objectid != BTRFS_DATA_RELOC_TREE_OBJECTID) + return ERR_PTR(-ENOENT); again: root = btrfs_lookup_fs_root(fs_info, objectid); if (root) {
[BUG] Syzbot reported a weird ASSERT() triggered inside prepare_to_merge(). With extra debug output, it shows we're trying to relocate tree blocks for quota root: BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17 ------------[ cut here ]------------ BTRFS: Transaction aborted (error -117) [CAUSE] Normally we should not use reloc tree for quota root at all, as reloc trees are only for subvolume trees. But there is a race between quota enabling and relocation, this happens after commit 85724171b302 ("btrfs: fix the btrfs_get_global_root return value"). Before that commit, for quota and free space tree, we exit immediately if we can not grab it from fs_info. But now we would try to read it from disk, just as if they are fs trees, this sets ROOT_SHAREABLE flags in such race: Thread A | Thread B ---------------------------------+------------------------------ btrfs_quota_enable() | | | btrfs_get_root_ref() | | |- btrfs_get_global_root() | | | Returned NULL | | |- btrfs_lookup_fs_root() | | | Returned NULL |- btrfs_create_tree() | | | Now quota root item is | | | inserted | |- btrfs_read_tree_root() | | | Got the newly inserted quota root | | |- btrfs_init_fs_root() | | | Set ROOT_SHAREABLE flag [FIX] Get back to the old behavior by returning PTR_ERR(-ENOENT) if the target objectid is not a subvolume tree or data reloc tree. Fixes: 85724171b302 ("btrfs: fix the btrfs_get_global_root return value") Reported-and-tested-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 10 ++++++++++ 1 file changed, 10 insertions(+)