diff mbox series

[RFC] btrfs: volumes: Drop chunk_mutex lock/unlock on btrfs_read_chunk_tree

Message ID 20200714172153.12956-1-marcos@mpdesouza.com (mailing list archive)
State New, archived
Headers show
Series [RFC] btrfs: volumes: Drop chunk_mutex lock/unlock on btrfs_read_chunk_tree | expand

Commit Message

Marcos Paulo de Souza July 14, 2020, 5:21 p.m. UTC
From: Marcos Paulo de Souza <mpdesouza@suse.com>

There is a lockdep report when running xfstests btrfs/161 (seed + sprout
devices) related to chunk_mutex:

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc4-default #504 Not tainted
------------------------------------------------------
mount/454 is trying to acquire lock:
ffff8881133058e8 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x340

but task is already holding lock:
ffff888112010988 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0xcc/0x600

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
       __mutex_lock+0x139/0x13e0
       btrfs_init_new_device+0x1ed1/0x3c50
       btrfs_ioctl+0x19b4/0x8130
       ksys_ioctl+0xa9/0xf0
       __x64_sys_ioctl+0x6f/0xb0
       do_syscall_64+0x50/0xd0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
       __lock_acquire+0x2df6/0x4da0
       lock_acquire+0x176/0x820
       __mutex_lock+0x139/0x13e0
       clone_fs_devices+0x4d/0x340
       read_one_dev+0xa5c/0x13e0
       btrfs_read_chunk_tree+0x480/0x600
       open_ctree+0x244b/0x450d
       btrfs_mount_root.cold.80+0x10/0x129
       legacy_get_tree+0xff/0x1f0
       vfs_get_tree+0x83/0x250
       fc_mount+0xf/0x90
       vfs_kern_mount.part.47+0x5c/0x80
       btrfs_mount+0x215/0xbcf
       legacy_get_tree+0xff/0x1f0
       vfs_get_tree+0x83/0x250
       do_mount+0x106d/0x16f0
       __x64_sys_mount+0x162/0x1b0
       do_syscall_64+0x50/0xd0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fs_info->chunk_mutex);
                               lock(&fs_devs->device_list_mutex);
                               lock(&fs_info->chunk_mutex);
  lock(&fs_devs->device_list_mutex);

 *** DEADLOCK ***

3 locks held by mount/454:
 #0: ffff8881119340e8 (&type->s_umount_key#41/1){+.+.}-{3:3}, at: alloc_super.isra.21+0x183/0x990
 #1: ffffffff83b260d0 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0xb6/0x600
 #2: ffff888112010988 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0xcc/0x600

stack backtrace:
CPU: 3 PID: 454 Comm: mount Not tainted 5.8.0-rc4-default #504
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0x9d/0xe0
 check_noncircular+0x351/0x410
 ? print_circular_bug.isra.41+0x360/0x360
 ? mark_lock+0x12d/0x14d0
 ? rcu_read_unlock+0x40/0x40
 ? sched_clock+0x5/0x10
 ? sched_clock_cpu+0x18/0x170
 ? sched_clock_cpu+0x18/0x170
 __lock_acquire+0x2df6/0x4da0
 ? lockdep_hardirqs_on_prepare+0x540/0x540
 ? _raw_spin_unlock_irqrestore+0x3e/0x50
 ? trace_hardirqs_on+0x20/0x170
 ? stack_depot_save+0x260/0x470
 lock_acquire+0x176/0x820
 ? clone_fs_devices+0x4d/0x340
 ? btrfs_mount_root.cold.80+0x10/0x129
 ? rcu_read_unlock+0x40/0x40
 ? vfs_kern_mount.part.47+0x5c/0x80
 ? btrfs_mount+0x215/0xbcf
 ? legacy_get_tree+0xff/0x1f0
 ? vfs_get_tree+0x83/0x250
 ? do_mount+0x106d/0x16f0
 ? __x64_sys_mount+0x162/0x1b0
 ? do_syscall_64+0x50/0xd0
 __mutex_lock+0x139/0x13e0
 ? clone_fs_devices+0x4d/0x340
 ? clone_fs_devices+0x4d/0x340
 ? mark_held_locks+0xb7/0x120
 ? mutex_lock_io_nested+0x12a0/0x12a0
 ? rcu_read_lock_sched_held+0xa1/0xd0
 ? lockdep_init_map_waits+0x29f/0x810
 ? lockdep_init_map_waits+0x29f/0x810
 ? debug_mutex_init+0x31/0x60
 ? clone_fs_devices+0x4d/0x340
 clone_fs_devices+0x4d/0x340
 ? read_extent_buffer+0x15b/0x2a0
 read_one_dev+0xa5c/0x13e0
 ? split_leaf+0xef0/0xef0
 ? read_one_chunk+0xb20/0xb20
 ? btrfs_get_token_32+0x730/0x730
 ? memcpy+0x38/0x60
 ? read_extent_buffer+0x15b/0x2a0
 btrfs_read_chunk_tree+0x480/0x600
 ? btrfs_check_rw_degradable+0x340/0x340
 ? btrfs_root_node+0x2d/0x240
 ? memcpy+0x38/0x60
 ? read_extent_buffer+0x15b/0x2a0
 ? btrfs_root_node+0x2d/0x240
 open_ctree+0x244b/0x450d
 ? close_ctree+0x61c/0x61c
 ? sget+0x328/0x400
 btrfs_mount_root.cold.80+0x10/0x129
 ? parse_rescue_options+0x260/0x260
 ? rcu_read_lock_sched_held+0xa1/0xd0
 ? rcu_read_lock_bh_held+0xb0/0xb0
 ? kfree+0x2e2/0x330
 ? parse_rescue_options+0x260/0x260
 legacy_get_tree+0xff/0x1f0
 vfs_get_tree+0x83/0x250
 fc_mount+0xf/0x90
 vfs_kern_mount.part.47+0x5c/0x80
 btrfs_mount+0x215/0xbcf
 ? check_object+0xb3/0x2c0
 ? btrfs_get_subvol_name_from_objectid+0x7f0/0x7f0
 ? ___slab_alloc+0x4e5/0x570
 ? vfs_parse_fs_string+0xc0/0x100
 ? rcu_read_lock_sched_held+0xa1/0xd0
 ? rcu_read_lock_bh_held+0xb0/0xb0
 ? kfree+0x2e2/0x330
 ? btrfs_get_subvol_name_from_objectid+0x7f0/0x7f0
 ? legacy_get_tree+0xff/0x1f0
 legacy_get_tree+0xff/0x1f0
 vfs_get_tree+0x83/0x250
 do_mount+0x106d/0x16f0
 ? lock_downgrade+0x720/0x720
 ? copy_mount_string+0x20/0x20
 ? _copy_from_user+0xbe/0x100
 ? memdup_user+0x47/0x70
 __x64_sys_mount+0x162/0x1b0
 do_syscall_64+0x50/0xd0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f82dd1583ca
Code: Bad RIP value.
RSP: 002b:00007ffcf6935fc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000562f10678500 RCX: 00007f82dd1583ca
RDX: 0000562f1067f8e0 RSI: 0000562f1067b3f0 RDI: 0000562f106786e0
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000562f106786e0
R13: 0000562f1067f8e0 R14: 0000000000000000 R15: 00007f82dd6798a4

In volumes.c there are comments stating that chunk_mutex is used to
protect add/remove chunks, trim or add/remove devices. Since
btrfs_read_chunk_tree is only called on mount, and trim and chunk alloc
cannot happen at mount time, it's safe to remove this lock/unlock.

Also, btrfs_ioctl_{add|rm}_dev calls set BTRFS_FS_EXCL_OP, which should
be safe.

Dropping the mutex lock/unlock solves the lockdep warning.

Suggested-by: Nokolay Borisov <nborisov@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 fs/btrfs/volumes.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Nikolay Borisov July 14, 2020, 6:05 p.m. UTC | #1
On 14.07.20 г. 20:43 ч., Nikolay Borisov wrote:
> 
> 
> On 14.07.20 г. 20:21 ч., Marcos Paulo de Souza wrote:
>> From: Marcos Paulo de Souza <mpdesouza@suse.com>
>>
>> There is a lockdep report when running xfstests btrfs/161 (seed + sprout
>> devices) related to chunk_mutex:
>>
> 
> For whatever reason this submission seems garbled. In any case my
> proposal was to replace the chunk mutex with FS_EXCL_OP in
> btrfs_read_chunk_tree which you haven't done. You simply remove the
> chunk mutex.
> 

Actually the mutex could possibly be removed because in order to
add/remove a device we need a well-formed mount i.e the mount process
must have been completed. Given that chunk tree read happens as part of
this I'd assume it's safe to drop it altogether.
David Sterba July 15, 2020, 1:14 p.m. UTC | #2
On Tue, Jul 14, 2020 at 02:21:53PM -0300, Marcos Paulo de Souza wrote:
> From: Marcos Paulo de Souza <mpdesouza@suse.com>
> In volumes.c there are comments stating that chunk_mutex is used to
> protect add/remove chunks, trim or add/remove devices. Since
> btrfs_read_chunk_tree is only called on mount, and trim and chunk alloc
> cannot happen at mount time, it's safe to remove this lock/unlock.
> 
> Also, btrfs_ioctl_{add|rm}_dev calls set BTRFS_FS_EXCL_OP, which should
> be safe.

I've analyzed it in
https://lore.kernel.org/linux-btrfs/20200512232827.GI18421@twin.jikos.cz/
and an ioctl gets called during mount with the seeding device. I don't
see it mentioned in your changelog.

The BTRFS_FS_EXCL_OP do not affect that because there's only one the
device add at this time.

> Dropping the mutex lock/unlock solves the lockdep warning.

Yes, but the question is if this is also correct :) Right now I don't
see how the excl_op flag is relevant here.

The patch proposed in
https://lore.kernel.org/linux-btrfs/20200513194659.34493-1-anand.jain@oracle.com/
has some issues, I tried to fix them back then but for some reason I
don't remember the patch did not work as expected.
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c7a3d4d730a3..94cbdadd9d26 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7033,7 +7033,6 @@  int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 	 * otherwise we don't need it.
 	 */
 	mutex_lock(&uuid_mutex);
-	mutex_lock(&fs_info->chunk_mutex);
 
 	/*
 	 * Read all device items, and then all the chunk items. All
@@ -7100,7 +7099,6 @@  int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 	}
 	ret = 0;
 error:
-	mutex_unlock(&fs_info->chunk_mutex);
 	mutex_unlock(&uuid_mutex);
 
 	btrfs_free_path(path);