Message ID | 2de92bdcebd36e4119467797dedae8a9d97a9df3.1677314616.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix the mount crash caused by confusing return value | expand |
On 2/25/23 16:44, Qu Wenruo wrote: > [BUG] > Btrfs mount can lead to crash if the fs has critical trees corrupted: > > # mkfs.btrfs -f /dev/test/scratch1 > # btrfs-map-logical -l 30588928 /dev/test/scratch1 # The bytenr is tree root > mirror 1 logical 30588928 physical 38977536 device /dev/test/scratch1 > mirror 2 logical 30588928 physical 307412992 device /dev/test/scratch1 > # xfs_io -f -c "pwrite 38977536 4" -c "pwrite 307412992 4" /dev/test/scratch1 > # mount /dev/test/scratch1 /mnt/btrfs > > And the above mount would crash with the following dmesg: > > BTRFS warning (device dm-4): checksum verify failed on logical 30588928 mirror 1 wanted 0xcdcdcdcd found 0x6ca45898 level 0 > BTRFS warning (device dm-4): checksum verify failed on logical 30588928 mirror 2 wanted 0xcdcdcdcd found 0x6ca45898 level 0 > BTRFS warning (device dm-4): couldn't read tree root > ================================================================== > BUG: KASAN: null-ptr-deref in btrfs_iget+0x74/0x160 [btrfs] > Read of size 8 at addr 00000000000001f7 by task mount/4040 > > [CAUSE] > In open_ctree(), we have two variables to indicates errors: @ret and > @err. > > Unfortunately such confusion leads to the above crash, as in the error > handling of load_super_root(), we just goto fail_tree_roots label, but > at the end of error handling, we return @err instead of @ret. > > Thus even we failed to load tree root, we still return 0 for > open_ctree(), thus later btrfs_iget() would fail. There are many child functions in open_ctree() that rely on the default value of @err, which is -EINVAL, to return an error instead of ret. The decoupling of @ret and the actual error returned by open_ctree() is intentional. IMO. However, the bug is that btrfs_init_btree_inode() return value is assigned to @err instead of @ret. ret = btrfs_init_btree_inode(sb); And the regression is caused by the following commit: commit 097421116e288dd3f5baaf1dd7e86035db60336f btrfs: move all btree inode initialization into btrfs_init_btree_inode This commit is still in misc-next. We can fold a fix as below: diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 48368d4bc331..0e0c30fe6df6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3360,9 +3360,11 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device goto fail; } - err = btrfs_init_btree_inode(sb); - if (err) + ret = btrfs_init_btree_inode(sb); + if (ret) { + err = ret; goto fail; + } invalidate_bdev(fs_devices->latest_dev->bdev);
On 2023/2/27 14:14, Anand Jain wrote: > On 2/25/23 16:44, Qu Wenruo wrote: >> [BUG] >> Btrfs mount can lead to crash if the fs has critical trees corrupted: >> >> # mkfs.btrfs -f /dev/test/scratch1 >> # btrfs-map-logical -l 30588928 /dev/test/scratch1 # The bytenr is >> tree root >> mirror 1 logical 30588928 physical 38977536 device /dev/test/scratch1 >> mirror 2 logical 30588928 physical 307412992 device /dev/test/scratch1 >> # xfs_io -f -c "pwrite 38977536 4" -c "pwrite 307412992 4" >> /dev/test/scratch1 >> # mount /dev/test/scratch1 /mnt/btrfs >> >> And the above mount would crash with the following dmesg: >> >> BTRFS warning (device dm-4): checksum verify failed on logical >> 30588928 mirror 1 wanted 0xcdcdcdcd found 0x6ca45898 level 0 >> BTRFS warning (device dm-4): checksum verify failed on logical >> 30588928 mirror 2 wanted 0xcdcdcdcd found 0x6ca45898 level 0 >> BTRFS warning (device dm-4): couldn't read tree root >> ================================================================== >> BUG: KASAN: null-ptr-deref in btrfs_iget+0x74/0x160 [btrfs] >> Read of size 8 at addr 00000000000001f7 by task mount/4040 >> >> [CAUSE] >> In open_ctree(), we have two variables to indicates errors: @ret and >> @err. >> >> Unfortunately such confusion leads to the above crash, as in the error >> handling of load_super_root(), we just goto fail_tree_roots label, but >> at the end of error handling, we return @err instead of @ret. >> >> Thus even we failed to load tree root, we still return 0 for >> open_ctree(), thus later btrfs_iget() would fail. > > > There are many child functions in open_ctree() that rely on the default > value of @err, which is -EINVAL, to return an error instead of ret. That's even more bug prone. After the first call site assigning @err, there is no more default value for @err. And this is not the first bug involving two error indicating numbers. Thus unless definitely needed, I strongly recommend not to use two error values. Thanks, Qu > > The decoupling of @ret and the actual error returned by open_ctree() > is intentional. IMO. > > However, the bug is that btrfs_init_btree_inode() return value is > assigned to @err instead of @ret. > > ret = btrfs_init_btree_inode(sb); > > And the regression is caused by the following commit: > > commit 097421116e288dd3f5baaf1dd7e86035db60336f > btrfs: move all btree inode initialization into btrfs_init_btree_inode > > This commit is still in misc-next. We can fold a fix as below: > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 48368d4bc331..0e0c30fe6df6 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3360,9 +3360,11 @@ int __cold open_ctree(struct super_block *sb, > struct btrfs_fs_devices *fs_device > goto fail; > } > > - err = btrfs_init_btree_inode(sb); > - if (err) > + ret = btrfs_init_btree_inode(sb); > + if (ret) { > + err = ret; > goto fail; > + } > > invalidate_bdev(fs_devices->latest_dev->bdev); > >
On Mon, Feb 27, 2023 at 02:34:25PM +0800, Qu Wenruo wrote: > > > On 2023/2/27 14:14, Anand Jain wrote: > > On 2/25/23 16:44, Qu Wenruo wrote: > >> [BUG] > >> Btrfs mount can lead to crash if the fs has critical trees corrupted: > >> > >> # mkfs.btrfs -f /dev/test/scratch1 > >> # btrfs-map-logical -l 30588928 /dev/test/scratch1 # The bytenr is > >> tree root > >> mirror 1 logical 30588928 physical 38977536 device /dev/test/scratch1 > >> mirror 2 logical 30588928 physical 307412992 device /dev/test/scratch1 > >> # xfs_io -f -c "pwrite 38977536 4" -c "pwrite 307412992 4" > >> /dev/test/scratch1 > >> # mount /dev/test/scratch1 /mnt/btrfs > >> > >> And the above mount would crash with the following dmesg: > >> > >> BTRFS warning (device dm-4): checksum verify failed on logical > >> 30588928 mirror 1 wanted 0xcdcdcdcd found 0x6ca45898 level 0 > >> BTRFS warning (device dm-4): checksum verify failed on logical > >> 30588928 mirror 2 wanted 0xcdcdcdcd found 0x6ca45898 level 0 > >> BTRFS warning (device dm-4): couldn't read tree root > >> ================================================================== > >> BUG: KASAN: null-ptr-deref in btrfs_iget+0x74/0x160 [btrfs] > >> Read of size 8 at addr 00000000000001f7 by task mount/4040 > >> > >> [CAUSE] > >> In open_ctree(), we have two variables to indicates errors: @ret and > >> @err. > >> > >> Unfortunately such confusion leads to the above crash, as in the error > >> handling of load_super_root(), we just goto fail_tree_roots label, but > >> at the end of error handling, we return @err instead of @ret. > >> > >> Thus even we failed to load tree root, we still return 0 for > >> open_ctree(), thus later btrfs_iget() would fail. > > > > > > There are many child functions in open_ctree() that rely on the default > > value of @err, which is -EINVAL, to return an error instead of ret. > > That's even more bug prone. > > After the first call site assigning @err, there is no more default value > for @err. > > And this is not the first bug involving two error indicating numbers. > Thus unless definitely needed, I strongly recommend not to use two error > values. Agreed, the error handling styles in open_ctree are mixed and we should switch it to using 'ret' with error value set before gotos. Basically what you did in this patch, please resend it as a cleanup, I've merged toe fixup from Anand to the original patch.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 48368d4bc331..e11aca6353a5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3339,14 +3339,11 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device struct btrfs_root *tree_root; struct btrfs_root *chunk_root; int ret; - int err = -EINVAL; int level; ret = init_mount_fs_info(fs_info, sb); - if (ret) { - err = ret; + if (ret) goto fail; - } /* These need to be init'ed before we start creating inodes and such. */ tree_root = btrfs_alloc_root(fs_info, BTRFS_ROOT_TREE_OBJECTID, @@ -3356,12 +3353,12 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device GFP_KERNEL); fs_info->chunk_root = chunk_root; if (!tree_root || !chunk_root) { - err = -ENOMEM; + ret = -ENOMEM; goto fail; } - err = btrfs_init_btree_inode(sb); - if (err) + ret = btrfs_init_btree_inode(sb); + if (ret) goto fail; invalidate_bdev(fs_devices->latest_dev->bdev); @@ -3371,7 +3368,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device */ disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev); if (IS_ERR(disk_super)) { - err = PTR_ERR(disk_super); + ret = PTR_ERR(disk_super); goto fail_alloc; } @@ -3383,7 +3380,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device if (!btrfs_supported_super_csum(csum_type)) { btrfs_err(fs_info, "unsupported checksum algorithm: %u", csum_type); - err = -EINVAL; + ret = -EINVAL; btrfs_release_disk_super(disk_super); goto fail_alloc; } @@ -3392,7 +3389,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device ret = btrfs_init_csum_hash(fs_info, csum_type); if (ret) { - err = ret; btrfs_release_disk_super(disk_super); goto fail_alloc; } @@ -3403,7 +3399,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device */ if (btrfs_check_super_csum(fs_info, disk_super)) { btrfs_err(fs_info, "superblock checksum mismatch"); - err = -EINVAL; + ret = -EINVAL; btrfs_release_disk_super(disk_super); goto fail_alloc; } @@ -3433,7 +3429,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device ret = btrfs_validate_mount_super(fs_info); if (ret) { btrfs_err(fs_info, "superblock contains fatal errors"); - err = -EINVAL; + ret = -EINVAL; goto fail_alloc; } @@ -3465,16 +3461,12 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device fs_info->stripesize = stripesize; ret = btrfs_parse_options(fs_info, options, sb->s_flags); - if (ret) { - err = ret; + if (ret) goto fail_alloc; - } ret = btrfs_check_features(fs_info, !sb_rdonly(sb)); - if (ret < 0) { - err = ret; + if (ret < 0) goto fail_alloc; - } if (sectorsize < PAGE_SIZE) { struct btrfs_subpage_info *subpage_info; @@ -3501,10 +3493,8 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device } ret = btrfs_init_workqueues(fs_info); - if (ret) { - err = ret; + if (ret) goto fail_sb_buffer; - } sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super); sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE); @@ -3702,16 +3692,14 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device !btrfs_test_opt(fs_info, NOLOGREPLAY)) { btrfs_info(fs_info, "start tree-log replay"); ret = btrfs_replay_log(fs_info, fs_devices); - if (ret) { - err = ret; + if (ret) goto fail_qgroup; - } } fs_info->fs_root = btrfs_get_fs_root(fs_info, BTRFS_FS_TREE_OBJECTID, true); if (IS_ERR(fs_info->fs_root)) { - err = PTR_ERR(fs_info->fs_root); - btrfs_warn(fs_info, "failed to read fs tree: %d", err); + ret = PTR_ERR(fs_info->fs_root); + btrfs_warn(fs_info, "failed to read fs tree: %d", ret); fs_info->fs_root = NULL; goto fail_qgroup; } @@ -3788,7 +3776,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device iput(fs_info->btree_inode); fail: btrfs_close_devices(fs_info->fs_devices); - return err; + return ret; } ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
[BUG] Btrfs mount can lead to crash if the fs has critical trees corrupted: # mkfs.btrfs -f /dev/test/scratch1 # btrfs-map-logical -l 30588928 /dev/test/scratch1 # The bytenr is tree root mirror 1 logical 30588928 physical 38977536 device /dev/test/scratch1 mirror 2 logical 30588928 physical 307412992 device /dev/test/scratch1 # xfs_io -f -c "pwrite 38977536 4" -c "pwrite 307412992 4" /dev/test/scratch1 # mount /dev/test/scratch1 /mnt/btrfs And the above mount would crash with the following dmesg: BTRFS warning (device dm-4): checksum verify failed on logical 30588928 mirror 1 wanted 0xcdcdcdcd found 0x6ca45898 level 0 BTRFS warning (device dm-4): checksum verify failed on logical 30588928 mirror 2 wanted 0xcdcdcdcd found 0x6ca45898 level 0 BTRFS warning (device dm-4): couldn't read tree root ================================================================== BUG: KASAN: null-ptr-deref in btrfs_iget+0x74/0x160 [btrfs] Read of size 8 at addr 00000000000001f7 by task mount/4040 [CAUSE] In open_ctree(), we have two variables to indicates errors: @ret and @err. Unfortunately such confusion leads to the above crash, as in the error handling of load_super_root(), we just goto fail_tree_roots label, but at the end of error handling, we return @err instead of @ret. Thus even we failed to load tree root, we still return 0 for open_ctree(), thus later btrfs_iget() would fail. [FIX] Instead of such dangerous mixing, remove @err variable completely, and use @ret only. Signed-off-by: Qu Wenruo <wqu@suse.com> --- Unfortunately I failed to pin down a single patch causing the problem. Would craft a test case for it. --- fs/btrfs/disk-io.c | 42 +++++++++++++++--------------------------- 1 file changed, 15 insertions(+), 27 deletions(-)