diff mbox series

btrfs: fix the wrong rcu_str_deref() usage

Message ID a4a0faeba1d195f2eb71fcaae388f5976f822dc2.1727904561.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix the wrong rcu_str_deref() usage | expand

Commit Message

Qu Wenruo Oct. 2, 2024, 9:29 p.m. UTC
For rcu_str_deref(), it should be called with rcu read lock hold.
But inside the function is_same_device(), the rcu string is accessed
without holding the rcu read lock, causing rcu warnings.

Fix it by holding the rcu read lock during the path resolution of the
existing device.

This will be folded into the offending patch "btrfs: avoid unnecessary
device path update for the same device"

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Filipe Manana Oct. 3, 2024, 10:10 a.m. UTC | #1
On Wed, Oct 2, 2024 at 10:39 PM Qu Wenruo <wqu@suse.com> wrote:
>
> For rcu_str_deref(), it should be called with rcu read lock hold.
> But inside the function is_same_device(), the rcu string is accessed
> without holding the rcu read lock, causing rcu warnings.
>
> Fix it by holding the rcu read lock during the path resolution of the
> existing device.
>
> This will be folded into the offending patch "btrfs: avoid unnecessary
> device path update for the same device"
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> ---
>  fs/btrfs/volumes.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 35c4d151b5b0..3867d3c18be5 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path)
>         if (!device->name)
>                 goto out;
>
> +       rcu_read_lock();
>         old_path = rcu_str_deref(device->name);
>         ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
> +       rcu_read_unlock();
>         if (ret)
>                 goto out;
>         ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
> --
> 2.46.2
>
>
Johannes Thumshirn Oct. 4, 2024, 10:03 a.m. UTC | #2
On 02.10.24 23:35, Qu Wenruo wrote:
> For rcu_str_deref(), it should be called with rcu read lock hold.
> But inside the function is_same_device(), the rcu string is accessed
> without holding the rcu read lock, causing rcu warnings.
> 
> Fix it by holding the rcu read lock during the path resolution of the
> existing device.
> 
> This will be folded into the offending patch "btrfs: avoid unnecessary
> device path update for the same device"
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/volumes.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 35c4d151b5b0..3867d3c18be5 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path)
>   	if (!device->name)
>   		goto out;
>   
> +	rcu_read_lock();
>   	old_path = rcu_str_deref(device->name);
>   	ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
> +	rcu_read_unlock();
>   	if (ret)
>   		goto out;
>   	ret = kern_path(new_path, LOOKUP_FOLLOW, &new);


With that patch applied I get the following in btrfs/006 with Qemu 
emulated NVMe drives:
[  150.421039] run fstests btrfs/006 at 2024-10-04 09:50:53
[  151.322158] BTRFS: device fsid e9489dbd-0346-4c23-9b8b-9a0221e235f8 
devid 1 transid 8 /dev/nvme0n1 (259:1) scanned by mount (29282)
[  151.322779] BTRFS info (device nvme0n1): first mount of filesystem 
e9489dbd-0346-4c23-9b8b-9a0221e235f8
[  151.322789] BTRFS info (device nvme0n1): using crc32c (crc32c-intel) 
checksum algorithm
[  151.322794] BTRFS info (device nvme0n1): using free-space-tree
[  152.831300] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 
devid 1 transid 6 /dev/nvme1n1 (259:0) scanned by mkfs.btrfs (29450)
[  152.831712] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 
devid 2 transid 6 /dev/nvme2n1 (259:3) scanned by mkfs.btrfs (29450)
[  152.832292] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 
devid 3 transid 6 /dev/nvme3n1 (259:2) scanned by mkfs.btrfs (29450)
[  152.832747] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 
devid 4 transid 6 /dev/nvme4n1 (259:4) scanned by mkfs.btrfs (29450)
[  152.833189] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2 
devid 5 transid 6 /dev/nvme5n1 (259:5) scanned by mkfs.btrfs (29450)
[  152.872989] BUG: sleeping function called from invalid context at 
include/linux/sched/mm.h:337
[  152.872995] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 
29459, name: (udev-worker)
[  152.872997] preempt_count: 0, expected: 0
[  152.872999] RCU nest depth: 1, expected: 0
[  152.873001] 3 locks held by (udev-worker)/29459:
[  152.873003]  #0: ffffffff8253e968 (uuid_mutex){....}-{3:3}, at: 
btrfs_control_ioctl+0xcb/0x1a0
[  152.873016]  #1: ffff8881105c18d8 
(&fs_devs->device_list_mutex){....}-{3:3}, at: 
device_list_add.constprop.0+0x126/0x670
[  152.873027]  #2: ffffffff824cdb60 (rcu_read_lock){....}-{1:2}, at: 
device_list_add.constprop.0+0x193/0x670
[  152.873036] CPU: 0 UID: 0 PID: 29459 Comm: (udev-worker) Tainted: G 
      W          6.11.0-rc7+ #800
[  152.873040] Tainted: [W]=WARN
[  152.873042] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  152.873044] Call Trace:
[  152.873047]  <TASK>
[  152.873050]  dump_stack_lvl+0x4b/0x70
[  152.873056]  __might_resched.cold+0xbf/0xcf
[  152.873061]  kmem_cache_alloc_noprof+0x1c5/0x2e0
[  152.873067]  ? getname_kernel+0x29/0x150
[  152.873073]  getname_kernel+0x29/0x150
[  152.873077]  kern_path+0x17/0x50
[  152.873081]  device_list_add.constprop.0+0x1c1/0x670
[  152.873085]  ? device_list_add.constprop.0+0x193/0x670
[  152.873089]  ? device_list_add.constprop.0+0x193/0x670
[  152.873092]  ? btrfs_scan_one_device+0x1bf/0x410
[  152.873101]  btrfs_scan_one_device+0x239/0x410
[  152.873112]  btrfs_control_ioctl+0xdb/0x1a0
[  152.873119]  __x64_sys_ioctl+0x96/0xc0
[  152.873127]  do_syscall_64+0x54/0x110
[  152.873132]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  152.873139] RIP: 0033:0x7fa8daf99f9b
[  152.873143] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 
00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 
05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  152.873146] RSP: 002b:00007ffd34cd5cf0 EFLAGS: 00000246 ORIG_RAX: 
0000000000000010
[  152.873150] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
00007fa8daf99f9b
[  152.873152] RDX: 00007ffd34cd5d70 RSI: 0000000090009427 RDI: 
0000000000000005
[  152.873154] RBP: 00007ffd34cd5d70 R08: 0000000000000000 R09: 
00007ffd34cd6cd0
[  152.873156] R10: 0000000000000000 R11: 0000000000000246 R12: 
00005639a2ce2b60
[  152.873158] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000005
[  152.873166]  </TASK>
[  153.117462] BTRFS info (device nvme1n1): first mount of filesystem 
05148ba4-7ee1-480f-af62-7aa0b80b35f2
[  153.117475] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) 
checksum algorithm
[  153.117480] BTRFS info (device nvme1n1): using free-space-tree
[  153.120689] BTRFS info (device nvme1n1): checking UUID tree
[  153.632137] BTRFS info (device nvme0n1): last unmount of filesystem 
e9489dbd-0346-4c23-9b8b-9a0221e235f8
[  153.715256] BTRFS info (device nvme1n1): last unmount of filesystem 
05148ba4-7ee1-480f-af62-7aa0b80b35f2
[  153.818188] BTRFS info (device nvme1n1): first mount of filesystem 
05148ba4-7ee1-480f-af62-7aa0b80b35f2
[  153.818203] BTRFS info (device nvme1n1): using crc32c (crc32c-intel) 
checksum algorithm
[  153.818209] BTRFS info (device nvme1n1): using free-space-tree
[  153.884093] BTRFS info (device nvme1n1): last unmount of filesystem 
05148ba4-7ee1-480f-af62-7aa0b80b35f2
Qu Wenruo Oct. 4, 2024, 10:08 a.m. UTC | #3
在 2024/10/4 19:33, Johannes Thumshirn 写道:
> On 02.10.24 23:35, Qu Wenruo wrote:
>> For rcu_str_deref(), it should be called with rcu read lock hold.
>> But inside the function is_same_device(), the rcu string is accessed
>> without holding the rcu read lock, causing rcu warnings.
>>
>> Fix it by holding the rcu read lock during the path resolution of the
>> existing device.
>>
>> This will be folded into the offending patch "btrfs: avoid unnecessary
>> device path update for the same device"
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>    fs/btrfs/volumes.c | 2 ++
>>    1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 35c4d151b5b0..3867d3c18be5 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -807,8 +807,10 @@ static bool is_same_device(struct btrfs_device *device, const char *new_path)
>>    	if (!device->name)
>>    		goto out;
>>
>> +	rcu_read_lock();
>>    	old_path = rcu_str_deref(device->name);
>>    	ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
>> +	rcu_read_unlock();
>>    	if (ret)
>>    		goto out;
>>    	ret = kern_path(new_path, LOOKUP_FOLLOW, &new);
>
>
> With that patch applied I get the following in btrfs/006 with Qemu
> emulated NVMe drives:
> [  150.421039] run fstests btrfs/006 at 2024-10-04 09:50:53
> [  151.322158] BTRFS: device fsid e9489dbd-0346-4c23-9b8b-9a0221e235f8
> devid 1 transid 8 /dev/nvme0n1 (259:1) scanned by mount (29282)
> [  151.322779] BTRFS info (device nvme0n1): first mount of filesystem
> e9489dbd-0346-4c23-9b8b-9a0221e235f8
> [  151.322789] BTRFS info (device nvme0n1): using crc32c (crc32c-intel)
> checksum algorithm
> [  151.322794] BTRFS info (device nvme0n1): using free-space-tree
> [  152.831300] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 1 transid 6 /dev/nvme1n1 (259:0) scanned by mkfs.btrfs (29450)
> [  152.831712] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 2 transid 6 /dev/nvme2n1 (259:3) scanned by mkfs.btrfs (29450)
> [  152.832292] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 3 transid 6 /dev/nvme3n1 (259:2) scanned by mkfs.btrfs (29450)
> [  152.832747] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 4 transid 6 /dev/nvme4n1 (259:4) scanned by mkfs.btrfs (29450)
> [  152.833189] BTRFS: device fsid 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> devid 5 transid 6 /dev/nvme5n1 (259:5) scanned by mkfs.btrfs (29450)
> [  152.872989] BUG: sleeping function called from invalid context at
> include/linux/sched/mm.h:337

My bad, I forgot that the kern_path() can sleep, thus it can not be
utilized inside RCU context.

Instead I should dump the rcu string into a local buffer instead.

And of course I should enable the sleep in atomic for my aarch64 VM too.

Sorry for the RCU related regression again and again.
Qu

> [  152.872995] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid:
> 29459, name: (udev-worker)
> [  152.872997] preempt_count: 0, expected: 0
> [  152.872999] RCU nest depth: 1, expected: 0
> [  152.873001] 3 locks held by (udev-worker)/29459:
> [  152.873003]  #0: ffffffff8253e968 (uuid_mutex){....}-{3:3}, at:
> btrfs_control_ioctl+0xcb/0x1a0
> [  152.873016]  #1: ffff8881105c18d8
> (&fs_devs->device_list_mutex){....}-{3:3}, at:
> device_list_add.constprop.0+0x126/0x670
> [  152.873027]  #2: ffffffff824cdb60 (rcu_read_lock){....}-{1:2}, at:
> device_list_add.constprop.0+0x193/0x670
> [  152.873036] CPU: 0 UID: 0 PID: 29459 Comm: (udev-worker) Tainted: G
>        W          6.11.0-rc7+ #800
> [  152.873040] Tainted: [W]=WARN
> [  152.873042] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  152.873044] Call Trace:
> [  152.873047]  <TASK>
> [  152.873050]  dump_stack_lvl+0x4b/0x70
> [  152.873056]  __might_resched.cold+0xbf/0xcf
> [  152.873061]  kmem_cache_alloc_noprof+0x1c5/0x2e0
> [  152.873067]  ? getname_kernel+0x29/0x150
> [  152.873073]  getname_kernel+0x29/0x150
> [  152.873077]  kern_path+0x17/0x50
> [  152.873081]  device_list_add.constprop.0+0x1c1/0x670
> [  152.873085]  ? device_list_add.constprop.0+0x193/0x670
> [  152.873089]  ? device_list_add.constprop.0+0x193/0x670
> [  152.873092]  ? btrfs_scan_one_device+0x1bf/0x410
> [  152.873101]  btrfs_scan_one_device+0x239/0x410
> [  152.873112]  btrfs_control_ioctl+0xdb/0x1a0
> [  152.873119]  __x64_sys_ioctl+0x96/0xc0
> [  152.873127]  do_syscall_64+0x54/0x110
> [  152.873132]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  152.873139] RIP: 0033:0x7fa8daf99f9b
> [  152.873143] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10
> 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f
> 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
> [  152.873146] RSP: 002b:00007ffd34cd5cf0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  152.873150] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007fa8daf99f9b
> [  152.873152] RDX: 00007ffd34cd5d70 RSI: 0000000090009427 RDI:
> 0000000000000005
> [  152.873154] RBP: 00007ffd34cd5d70 R08: 0000000000000000 R09:
> 00007ffd34cd6cd0
> [  152.873156] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00005639a2ce2b60
> [  152.873158] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000005
> [  152.873166]  </TASK>
> [  153.117462] BTRFS info (device nvme1n1): first mount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> [  153.117475] BTRFS info (device nvme1n1): using crc32c (crc32c-intel)
> checksum algorithm
> [  153.117480] BTRFS info (device nvme1n1): using free-space-tree
> [  153.120689] BTRFS info (device nvme1n1): checking UUID tree
> [  153.632137] BTRFS info (device nvme0n1): last unmount of filesystem
> e9489dbd-0346-4c23-9b8b-9a0221e235f8
> [  153.715256] BTRFS info (device nvme1n1): last unmount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> [  153.818188] BTRFS info (device nvme1n1): first mount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
> [  153.818203] BTRFS info (device nvme1n1): using crc32c (crc32c-intel)
> checksum algorithm
> [  153.818209] BTRFS info (device nvme1n1): using free-space-tree
> [  153.884093] BTRFS info (device nvme1n1): last unmount of filesystem
> 05148ba4-7ee1-480f-af62-7aa0b80b35f2
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 35c4d151b5b0..3867d3c18be5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -807,8 +807,10 @@  static bool is_same_device(struct btrfs_device *device, const char *new_path)
 	if (!device->name)
 		goto out;
 
+	rcu_read_lock();
 	old_path = rcu_str_deref(device->name);
 	ret = kern_path(old_path, LOOKUP_FOLLOW, &old);
+	rcu_read_unlock();
 	if (ret)
 		goto out;
 	ret = kern_path(new_path, LOOKUP_FOLLOW, &new);