Message ID | 2b3c3fd6b5a6b9f9a7aa39cd343b233a11495bce.1708707655.git.fdmanana@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix double free of anonymous device after snapshot creation failure | expand |
On Fri, Feb 23, 2024 at 05:03:19PM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > When creating a snapshot we may do a double free of an anonymous device > in case there's an error comitting the transaction. The second free may > result in freeing an anonymous device number that was allocated by some > other subsystem in the kernel or another btrfs filesystem. > > The steps that lead to this: > > 1) At ioctl.c:create_snapshot() we allocate an anonymous device number > and assign it to pending_snapshot->anon_dev; > > 2) Then we call btrfs_commit_transaction() and end up at > transaction.c:create_pending_snapshot(); > > 3) There we call btrfs_get_new_fs_root() and pass it the anonymous device > number stored in pending_snapshot->anon_dev; > > 4) btrfs_get_new_fs_root() frees that anonymous device number because > btrfs_lookup_fs_root() returned a root - someone else did a lookup > of the new root already, which could some task doing backref walking; > > 5) After that some error happens in the transaction commit path, and at > ioctl.c:create_snapshot() we jump to the 'fail' label, and after > that we free again the same anonymous device number, which in the > meanwhile may have been reallocated somewhere else, because > pending_snapshot->anon_dev still has the same value as in step 1. > > Recently syzbot ran into this and reported the following trace: > > ------------[ cut here ]------------ > ida_free called for id=51 which is not allocated. > WARNING: CPU: 1 PID: 31038 at lib/idr.c:525 ida_free+0x370/0x420 lib/idr.c:525 > Modules linked in: > CPU: 1 PID: 31038 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00410-gc02197fc9076 #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 > RIP: 0010:ida_free+0x370/0x420 lib/idr.c:525 > Code: 10 42 80 3c 28 (...) > RSP: 0018:ffffc90015a67300 EFLAGS: 00010246 > RAX: be5130472f5dd000 RBX: 0000000000000033 RCX: 0000000000040000 > RDX: ffffc90009a7a000 RSI: 000000000003ffff RDI: 0000000000040000 > RBP: ffffc90015a673f0 R08: ffffffff81577992 R09: 1ffff92002b4cdb4 > R10: dffffc0000000000 R11: fffff52002b4cdb5 R12: 0000000000000246 > R13: dffffc0000000000 R14: ffffffff8e256b80 R15: 0000000000000246 > FS: 00007fca3f4b46c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f167a17b978 CR3: 000000001ed26000 CR4: 0000000000350ef0 > Call Trace: > <TASK> > btrfs_get_root_ref+0xa48/0xaf0 fs/btrfs/disk-io.c:1346 > create_pending_snapshot+0xff2/0x2bc0 fs/btrfs/transaction.c:1837 > create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1931 > btrfs_commit_transaction+0xf1c/0x3740 fs/btrfs/transaction.c:2404 > create_snapshot+0x507/0x880 fs/btrfs/ioctl.c:848 > btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:998 > btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1044 > __btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1306 > btrfs_ioctl_snap_create_v2+0x1ca/0x400 fs/btrfs/ioctl.c:1393 > btrfs_ioctl+0xa74/0xd40 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:871 [inline] > __se_sys_ioctl+0xfe/0x170 fs/ioctl.c:857 > do_syscall_64+0xfb/0x240 > entry_SYSCALL_64_after_hwframe+0x6f/0x77 > RIP: 0033:0x7fca3e67dda9 > Code: 28 00 00 00 (...) > RSP: 002b:00007fca3f4b40c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fca3e7abf80 RCX: 00007fca3e67dda9 > RDX: 00000000200005c0 RSI: 0000000050009417 RDI: 0000000000000003 > RBP: 00007fca3e6ca47a R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 000000000000000b R14: 00007fca3e7abf80 R15: 00007fff6bf95658 > </TASK> > > Where we get an explicit message where we attempt to free an anonymous > device number that is not currently allocated. It happens in a different > code path from the example below, at btrfs_get_root_ref(), so this change > may not fix the case triggered by syzbot. > > To fix at least the code path from the example above, change > btrfs_get_root_ref() and its callers to receive a dev_t pointer argument > for the anonymous device number, so that in case it frees the number, it > also resets it to 0, so that up in the call chain we don't attempt to do > the double free. > > Link: https://lore.kernel.org/linux-btrfs/000000000000f673a1061202f630@google.com/ The link is fine as one can go directly to the report, for the syzbot to auto-close the report I think it's most reliable to add Reported-by: syzbot+623a623cfed57f422be1@syzkaller.appspotmail.com > Fixes: e03ee2fe873e ("btrfs: do not ASSERT() if the newly created subvolume already got read") This has been backported up to 5.10 so we'll need CC: stable@vger.kernel.org # 5.10+ > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com>
On Mon, Feb 26, 2024 at 8:27 PM David Sterba <dsterba@suse.cz> wrote: > > On Fri, Feb 23, 2024 at 05:03:19PM +0000, fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > When creating a snapshot we may do a double free of an anonymous device > > in case there's an error comitting the transaction. The second free may > > result in freeing an anonymous device number that was allocated by some > > other subsystem in the kernel or another btrfs filesystem. > > > > The steps that lead to this: > > > > 1) At ioctl.c:create_snapshot() we allocate an anonymous device number > > and assign it to pending_snapshot->anon_dev; > > > > 2) Then we call btrfs_commit_transaction() and end up at > > transaction.c:create_pending_snapshot(); > > > > 3) There we call btrfs_get_new_fs_root() and pass it the anonymous device > > number stored in pending_snapshot->anon_dev; > > > > 4) btrfs_get_new_fs_root() frees that anonymous device number because > > btrfs_lookup_fs_root() returned a root - someone else did a lookup > > of the new root already, which could some task doing backref walking; > > > > 5) After that some error happens in the transaction commit path, and at > > ioctl.c:create_snapshot() we jump to the 'fail' label, and after > > that we free again the same anonymous device number, which in the > > meanwhile may have been reallocated somewhere else, because > > pending_snapshot->anon_dev still has the same value as in step 1. > > > > Recently syzbot ran into this and reported the following trace: > > > > ------------[ cut here ]------------ > > ida_free called for id=51 which is not allocated. > > WARNING: CPU: 1 PID: 31038 at lib/idr.c:525 ida_free+0x370/0x420 lib/idr.c:525 > > Modules linked in: > > CPU: 1 PID: 31038 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00410-gc02197fc9076 #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 > > RIP: 0010:ida_free+0x370/0x420 lib/idr.c:525 > > Code: 10 42 80 3c 28 (...) > > RSP: 0018:ffffc90015a67300 EFLAGS: 00010246 > > RAX: be5130472f5dd000 RBX: 0000000000000033 RCX: 0000000000040000 > > RDX: ffffc90009a7a000 RSI: 000000000003ffff RDI: 0000000000040000 > > RBP: ffffc90015a673f0 R08: ffffffff81577992 R09: 1ffff92002b4cdb4 > > R10: dffffc0000000000 R11: fffff52002b4cdb5 R12: 0000000000000246 > > R13: dffffc0000000000 R14: ffffffff8e256b80 R15: 0000000000000246 > > FS: 00007fca3f4b46c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00007f167a17b978 CR3: 000000001ed26000 CR4: 0000000000350ef0 > > Call Trace: > > <TASK> > > btrfs_get_root_ref+0xa48/0xaf0 fs/btrfs/disk-io.c:1346 > > create_pending_snapshot+0xff2/0x2bc0 fs/btrfs/transaction.c:1837 > > create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1931 > > btrfs_commit_transaction+0xf1c/0x3740 fs/btrfs/transaction.c:2404 > > create_snapshot+0x507/0x880 fs/btrfs/ioctl.c:848 > > btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:998 > > btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1044 > > __btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1306 > > btrfs_ioctl_snap_create_v2+0x1ca/0x400 fs/btrfs/ioctl.c:1393 > > btrfs_ioctl+0xa74/0xd40 > > vfs_ioctl fs/ioctl.c:51 [inline] > > __do_sys_ioctl fs/ioctl.c:871 [inline] > > __se_sys_ioctl+0xfe/0x170 fs/ioctl.c:857 > > do_syscall_64+0xfb/0x240 > > entry_SYSCALL_64_after_hwframe+0x6f/0x77 > > RIP: 0033:0x7fca3e67dda9 > > Code: 28 00 00 00 (...) > > RSP: 002b:00007fca3f4b40c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > RAX: ffffffffffffffda RBX: 00007fca3e7abf80 RCX: 00007fca3e67dda9 > > RDX: 00000000200005c0 RSI: 0000000050009417 RDI: 0000000000000003 > > RBP: 00007fca3e6ca47a R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > R13: 000000000000000b R14: 00007fca3e7abf80 R15: 00007fff6bf95658 > > </TASK> > > > > Where we get an explicit message where we attempt to free an anonymous > > device number that is not currently allocated. It happens in a different > > code path from the example below, at btrfs_get_root_ref(), so this change > > may not fix the case triggered by syzbot. > > > > To fix at least the code path from the example above, change > > btrfs_get_root_ref() and its callers to receive a dev_t pointer argument > > for the anonymous device number, so that in case it frees the number, it > > also resets it to 0, so that up in the call chain we don't attempt to do > > the double free. > > > > Link: https://lore.kernel.org/linux-btrfs/000000000000f673a1061202f630@google.com/ > > The link is fine as one can go directly to the report, for the syzbot to > auto-close the report I think it's most reliable to add > > Reported-by: syzbot+623a623cfed57f422be1@syzkaller.appspotmail.com So I omitted that intentionally because, as I say in the changelog, I don't think it's the same scenario as reported by syzbot. There we attempt the double free in a different place from the scenario I found. Since I mentioned that case, which I believe might be different, I left only the Link tag and not the Reported-by. > > > Fixes: e03ee2fe873e ("btrfs: do not ASSERT() if the newly created subvolume already got read") > > This has been backported up to 5.10 so we'll need > > CC: stable@vger.kernel.org # 5.10+ > > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > Reviewed-by: David Sterba <dsterba@suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a2e45ed6ef14..3df5477d48a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1308,12 +1308,12 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) * * @objectid: root id * @anon_dev: preallocated anonymous block device number for new roots, - * pass 0 for new allocation. + * pass NULL for a new allocation. * @check_ref: whether to check root item references, If true, return -ENOENT * for orphan roots */ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info, - u64 objectid, dev_t anon_dev, + u64 objectid, dev_t *anon_dev, bool check_ref) { struct btrfs_root *root; @@ -1343,9 +1343,9 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info, * that common but still possible. In that case, we just need * to free the anon_dev. */ - if (unlikely(anon_dev)) { - free_anon_bdev(anon_dev); - anon_dev = 0; + if (unlikely(anon_dev && *anon_dev)) { + free_anon_bdev(*anon_dev); + *anon_dev = 0; } if (check_ref && btrfs_root_refs(&root->root_item) == 0) { @@ -1367,7 +1367,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info, goto fail; } - ret = btrfs_init_fs_root(root, anon_dev); + ret = btrfs_init_fs_root(root, anon_dev ? *anon_dev : 0); if (ret) goto fail; @@ -1403,7 +1403,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info, * root's anon_dev to 0 to avoid a double free, once by btrfs_put_root() * and once again by our caller. */ - if (anon_dev) + if (anon_dev && *anon_dev) root->anon_dev = 0; btrfs_put_root(root); return ERR_PTR(ret); @@ -1419,7 +1419,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info, struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, u64 objectid, bool check_ref) { - return btrfs_get_root_ref(fs_info, objectid, 0, check_ref); + return btrfs_get_root_ref(fs_info, objectid, NULL, check_ref); } /* @@ -1427,11 +1427,11 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, * the anonymous block device id * * @objectid: tree objectid - * @anon_dev: if zero, allocate a new anonymous block device or use the - * parameter value + * @anon_dev: if NULL, allocate a new anonymous block device or use the + * parameter value if not NULL */ struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, - u64 objectid, dev_t anon_dev) + u64 objectid, dev_t *anon_dev) { return btrfs_get_root_ref(fs_info, objectid, anon_dev, true); } diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 375f62ae3709..76eb53fe7a11 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -73,7 +73,7 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info); struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, u64 objectid, bool check_ref); struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, - u64 objectid, dev_t anon_dev); + u64 objectid, dev_t *anon_dev); struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info, struct btrfs_path *path, u64 objectid); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 8b80fbea1e72..29e2b8e23363 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -731,7 +731,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap, free_extent_buffer(leaf); leaf = NULL; - new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev); + new_root = btrfs_get_new_fs_root(fs_info, objectid, &anon_dev); if (IS_ERR(new_root)) { ret = PTR_ERR(new_root); btrfs_abort_transaction(trans, ret); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index ea080ec2f927..31ac5a04cc02 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1832,7 +1832,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, } key.offset = (u64)-1; - pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev); + pending->snap = btrfs_get_new_fs_root(fs_info, objectid, &pending->anon_dev); if (IS_ERR(pending->snap)) { ret = PTR_ERR(pending->snap); pending->snap = NULL;