Message ID | a4a0faeba1d195f2eb71fcaae388f5976f822dc2.1727904561.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix the wrong rcu_str_deref() usage | expand |
On Wed, Oct 2, 2024 at 10:39 PM Qu Wenruo <wqu@suse.com> wrote: > > For rcu_str_deref(), it should be called with rcu read lock hold. > But inside the function is_same_device(), the rcu string is accessed > without holding the rcu read lock, causing rcu warnings. > > Fix it by holding the rcu read lock during the path resolution of the > existing device. > > This will be folded into the offending patch "btrfs: avoid unnecessary > device path update for the same device" > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > --- > fs/btrfs/volumes.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 35c4d151b5b0..3867d3c18be5 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path) > if (!device->name) > goto out; > > + rcu_read_lock(); > old_path = rcu_str_deref(device->name); > ret = kern_path(old_path, LOOKUP_FOLLOW, &old); > + rcu_read_unlock(); > if (ret) > goto out; > ret = kern_path(new_path, LOOKUP_FOLLOW, &new); > -- > 2.46.2 > >
On 02.10.24 23:35, Qu Wenruo wrote: > For rcu_str_deref(), it should be called with rcu read lock hold. > But inside the function is_same_device(), the rcu string is accessed > without holding the rcu read lock, causing rcu warnings. > > Fix it by holding the rcu read lock during the path resolution of the > existing device. > > This will be folded into the offending patch "btrfs: avoid unnecessary > device path update for the same device" > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/volumes.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 35c4d151b5b0..3867d3c18be5 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path) > if (!device->name) > goto out; > > + rcu_read_lock(); > old_path = rcu_str_deref(device->name); > ret = kern_path(old_path, LOOKUP_FOLLOW, &old); > + rcu_read_unlock(); > if (ret) > goto out; > ret = kern_path(new_path, LOOKUP_FOLLOW, &new); With that patch applied I get the following in btrfs/006 with Qemu emulated NVMe drives: [ 150.421039] run fstests btrfs/006 at 2024-10-04 09:50:53 [ 151.322158] BTRFS: device fsid e9489dbd-0346-4c23-9b8b-9a0221e235f8 devid 1 transid 8 /dev/nvme0n1 (259:1) scanned by mount (29282) [ 151.322779] BTRFS info (device nvme0n1): first mount of filesystem e9489dbd-0346-4c23-9b8b-9a0221e235f8 [ 151.322789] BTRFS info (device nvme0n1): using crc32c (crc32c-intel) checksum algorithm [ 151.322794] BTRFS info (device nvme0n1): using free-space-tree [ 152.831300] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 devid 1 transid 6 /dev/nvme1n1 (259:0) scanned by mkfs.btrfs (29450) [ 152.831712] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 devid 2 transid 6 /dev/nvme2n1 (259:3) scanned by mkfs.btrfs (29450) [ 152.832292] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 devid 3 transid 6 /dev/nvme3n1 (259:2) scanned by mkfs.btrfs (29450) [ 152.832747] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 devid 4 transid 6 /dev/nvme4n1 (259:4) scanned by mkfs.btrfs (29450) [ 152.833189] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 devid 5 transid 6 /dev/nvme5n1 (259:5) scanned by mkfs.btrfs (29450) [ 152.872989] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:337 [ 152.872995] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 29459, name: (udev-worker) [ 152.872997] preempt_count: 0, expected: 0 [ 152.872999] RCU nest depth: 1, expected: 0 [ 152.873001] 3 locks held by (udev-worker)/29459: [ 152.873003] #0: ffffffff8253e968 (uuid_mutex){....}-{3:3}, at: btrfs_control_ioctl+0xcb/0x1a0 [ 152.873016] #1: ffff8881105c18d8 (&fs_devs->device_list_mutex){....}-{3:3}, at: device_list_add.constprop.0+0x126/0x670 [ 152.873027] #2: ffffffff824cdb60 (rcu_read_lock){....}-{1:2}, at: device_list_add.constprop.0+0x193/0x670 [ 152.873036] CPU: 0 UID: 0 PID: 29459 Comm: (udev-worker) Tainted: G W 6.11.0-rc7+ #800 [ 152.873040] Tainted: [W]=WARN [ 152.873042] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 152.873044] Call Trace: [ 152.873047] <TASK> [ 152.873050] dump_stack_lvl+0x4b/0x70 [ 152.873056] __might_resched.cold+0xbf/0xcf [ 152.873061] kmem_cache_alloc_noprof+0x1c5/0x2e0 [ 152.873067] ? getname_kernel+0x29/0x150 [ 152.873073] getname_kernel+0x29/0x150 [ 152.873077] kern_path+0x17/0x50 [ 152.873081] device_list_add.constprop.0+0x1c1/0x670 [ 152.873085] ? device_list_add.constprop.0+0x193/0x670 [ 152.873089] ? device_list_add.constprop.0+0x193/0x670 [ 152.873092] ? btrfs_scan_one_device+0x1bf/0x410 [ 152.873101] btrfs_scan_one_device+0x239/0x410 [ 152.873112] btrfs_control_ioctl+0xdb/0x1a0 [ 152.873119] __x64_sys_ioctl+0x96/0xc0 [ 152.873127] do_syscall_64+0x54/0x110 [ 152.873132] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 152.873139] RIP: 0033:0x7fa8daf99f9b [ 152.873143] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 152.873146] RSP: 002b:00007ffd34cd5cf0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 152.873150] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa8daf99f9b [ 152.873152] RDX: 00007ffd34cd5d70 RSI: 0000000090009427 RDI: 0000000000000005 [ 152.873154] RBP: 00007ffd34cd5d70 R08: 0000000000000000 R09: 00007ffd34cd6cd0 [ 152.873156] R10: 0000000000000000 R11: 0000000000000246 R12: 00005639a2ce2b60 [ 152.873158] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000005 [ 152.873166] </TASK> [ 153.117462] BTRFS info (device nvme1n1): first mount of filesystem 05148ba4-7ee1-480f-af62-7aa0b80b35f2 [ 153.117475] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) checksum algorithm [ 153.117480] BTRFS info (device nvme1n1): using free-space-tree [ 153.120689] BTRFS info (device nvme1n1): checking UUID tree [ 153.632137] BTRFS info (device nvme0n1): last unmount of filesystem e9489dbd-0346-4c23-9b8b-9a0221e235f8 [ 153.715256] BTRFS info (device nvme1n1): last unmount of filesystem 05148ba4-7ee1-480f-af62-7aa0b80b35f2 [ 153.818188] BTRFS info (device nvme1n1): first mount of filesystem 05148ba4-7ee1-480f-af62-7aa0b80b35f2 [ 153.818203] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) checksum algorithm [ 153.818209] BTRFS info (device nvme1n1): using free-space-tree [ 153.884093] BTRFS info (device nvme1n1): last unmount of filesystem 05148ba4-7ee1-480f-af62-7aa0b80b35f2
在 2024/10/4 19:33, Johannes Thumshirn 写道: > On 02.10.24 23:35, Qu Wenruo wrote: >> For rcu_str_deref(), it should be called with rcu read lock hold. >> But inside the function is_same_device(), the rcu string is accessed >> without holding the rcu read lock, causing rcu warnings. >> >> Fix it by holding the rcu read lock during the path resolution of the >> existing device. >> >> This will be folded into the offending patch "btrfs: avoid unnecessary >> device path update for the same device" >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/volumes.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index 35c4d151b5b0..3867d3c18be5 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path) >> if (!device->name) >> goto out; >> >> + rcu_read_lock(); >> old_path = rcu_str_deref(device->name); >> ret = kern_path(old_path, LOOKUP_FOLLOW, &old); >> + rcu_read_unlock(); >> if (ret) >> goto out; >> ret = kern_path(new_path, LOOKUP_FOLLOW, &new); > > > With that patch applied I get the following in btrfs/006 with Qemu > emulated NVMe drives: > [ 150.421039] run fstests btrfs/006 at 2024-10-04 09:50:53 > [ 151.322158] BTRFS: device fsid e9489dbd-0346-4c23-9b8b-9a0221e235f8 > devid 1 transid 8 /dev/nvme0n1 (259:1) scanned by mount (29282) > [ 151.322779] BTRFS info (device nvme0n1): first mount of filesystem > e9489dbd-0346-4c23-9b8b-9a0221e235f8 > [ 151.322789] BTRFS info (device nvme0n1): using crc32c (crc32c-intel) > checksum algorithm > [ 151.322794] BTRFS info (device nvme0n1): using free-space-tree > [ 152.831300] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 > devid 1 transid 6 /dev/nvme1n1 (259:0) scanned by mkfs.btrfs (29450) > [ 152.831712] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 > devid 2 transid 6 /dev/nvme2n1 (259:3) scanned by mkfs.btrfs (29450) > [ 152.832292] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 > devid 3 transid 6 /dev/nvme3n1 (259:2) scanned by mkfs.btrfs (29450) > [ 152.832747] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 > devid 4 transid 6 /dev/nvme4n1 (259:4) scanned by mkfs.btrfs (29450) > [ 152.833189] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 > devid 5 transid 6 /dev/nvme5n1 (259:5) scanned by mkfs.btrfs (29450) > [ 152.872989] BUG: sleeping function called from invalid context at > include/linux/sched/mm.h:337 My bad, I forgot that the kern_path() can sleep, thus it can not be utilized inside RCU context. Instead I should dump the rcu string into a local buffer instead. And of course I should enable the sleep in atomic for my aarch64 VM too. Sorry for the RCU related regression again and again. Qu > [ 152.872995] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: > 29459, name: (udev-worker) > [ 152.872997] preempt_count: 0, expected: 0 > [ 152.872999] RCU nest depth: 1, expected: 0 > [ 152.873001] 3 locks held by (udev-worker)/29459: > [ 152.873003] #0: ffffffff8253e968 (uuid_mutex){....}-{3:3}, at: > btrfs_control_ioctl+0xcb/0x1a0 > [ 152.873016] #1: ffff8881105c18d8 > (&fs_devs->device_list_mutex){....}-{3:3}, at: > device_list_add.constprop.0+0x126/0x670 > [ 152.873027] #2: ffffffff824cdb60 (rcu_read_lock){....}-{1:2}, at: > device_list_add.constprop.0+0x193/0x670 > [ 152.873036] CPU: 0 UID: 0 PID: 29459 Comm: (udev-worker) Tainted: G > W 6.11.0-rc7+ #800 > [ 152.873040] Tainted: [W]=WARN > [ 152.873042] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 152.873044] Call Trace: > [ 152.873047] <TASK> > [ 152.873050] dump_stack_lvl+0x4b/0x70 > [ 152.873056] __might_resched.cold+0xbf/0xcf > [ 152.873061] kmem_cache_alloc_noprof+0x1c5/0x2e0 > [ 152.873067] ? getname_kernel+0x29/0x150 > [ 152.873073] getname_kernel+0x29/0x150 > [ 152.873077] kern_path+0x17/0x50 > [ 152.873081] device_list_add.constprop.0+0x1c1/0x670 > [ 152.873085] ? device_list_add.constprop.0+0x193/0x670 > [ 152.873089] ? device_list_add.constprop.0+0x193/0x670 > [ 152.873092] ? btrfs_scan_one_device+0x1bf/0x410 > [ 152.873101] btrfs_scan_one_device+0x239/0x410 > [ 152.873112] btrfs_control_ioctl+0xdb/0x1a0 > [ 152.873119] __x64_sys_ioctl+0x96/0xc0 > [ 152.873127] do_syscall_64+0x54/0x110 > [ 152.873132] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 152.873139] RIP: 0033:0x7fa8daf99f9b > [ 152.873143] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 > 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f > 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > [ 152.873146] RSP: 002b:00007ffd34cd5cf0 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ 152.873150] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: > 00007fa8daf99f9b > [ 152.873152] RDX: 00007ffd34cd5d70 RSI: 0000000090009427 RDI: > 0000000000000005 > [ 152.873154] RBP: 00007ffd34cd5d70 R08: 0000000000000000 R09: > 00007ffd34cd6cd0 > [ 152.873156] R10: 0000000000000000 R11: 0000000000000246 R12: > 00005639a2ce2b60 > [ 152.873158] R13: 0000000000000000 R14: 0000000000000000 R15: > 0000000000000005 > [ 152.873166] </TASK> > [ 153.117462] BTRFS info (device nvme1n1): first mount of filesystem > 05148ba4-7ee1-480f-af62-7aa0b80b35f2 > [ 153.117475] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) > checksum algorithm > [ 153.117480] BTRFS info (device nvme1n1): using free-space-tree > [ 153.120689] BTRFS info (device nvme1n1): checking UUID tree > [ 153.632137] BTRFS info (device nvme0n1): last unmount of filesystem > e9489dbd-0346-4c23-9b8b-9a0221e235f8 > [ 153.715256] BTRFS info (device nvme1n1): last unmount of filesystem > 05148ba4-7ee1-480f-af62-7aa0b80b35f2 > [ 153.818188] BTRFS info (device nvme1n1): first mount of filesystem > 05148ba4-7ee1-480f-af62-7aa0b80b35f2 > [ 153.818203] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) > checksum algorithm > [ 153.818209] BTRFS info (device nvme1n1): using free-space-tree > [ 153.884093] BTRFS info (device nvme1n1): last unmount of filesystem > 05148ba4-7ee1-480f-af62-7aa0b80b35f2
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 35c4d151b5b0..3867d3c18be5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path) if (!device->name) goto out; + rcu_read_lock(); old_path = rcu_str_deref(device->name); ret = kern_path(old_path, LOOKUP_FOLLOW, &old); + rcu_read_unlock(); if (ret) goto out; ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
For rcu_str_deref(), it should be called with rcu read lock hold. But inside the function is_same_device(), the rcu string is accessed without holding the rcu read lock, causing rcu warnings. Fix it by holding the rcu read lock during the path resolution of the existing device. This will be folded into the offending patch "btrfs: avoid unnecessary device path update for the same device" Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/volumes.c | 2 ++ 1 file changed, 2 insertions(+)