Message ID | 20240508114016.18119-1-jth@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | don't hold dev_replace rwsem over whole of btrfs_map_block | expand |
On Wed, May 8, 2024 at 12:40 PM Johannes Thumshirn <jth@kernel.org> wrote: > > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of > btrfs_map_block' sent as a series as I've added a 2nd patch, which > I've came accross while looking at the code. > > @Filipe, unfortunately I can't find the original report from the CI > anymore, so I don't have the stacktrace handy. > > Changes to RFC: > - Incorporated Filipe's review > - Added patch #2 > Link to RFC: > https://lore.kernel.org/linux-btrfs/2454cd4eb1694d37056e492af32b23743c63202b.1714663442.git.jth@kernel.org/ > > Johannes Thumshirn (2): > btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block > btrfs: pass 'struct btrfs_io_geometry' into handle_ops_on_dev_replace > > fs/btrfs/volumes.c | 52 ++++++++++++++++++++++++++-------------------- > 1 file changed, 29 insertions(+), 23 deletions(-) Reviewed-by: Filipe Manana <fdmanana@suse.com> Looks good, thanks. > > -- > 2.35.3 >
On Wed, May 08, 2024 at 01:40:14PM +0200, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of > btrfs_map_block' sent as a series as I've added a 2nd patch, which > I've came accross while looking at the code. > > @Filipe, unfortunately I can't find the original report from the CI > anymore, so I don't have the stacktrace handy. I can reproduce it. Please edit it a bit for the of changelog. [ 1022.275649] ============================================ [ 1022.276151] WARNING: possible recursive locking detected [ 1022.277976] 6.9.0-rc7-default+ #417 Not tainted [ 1022.278419] -------------------------------------------- [ 1022.278916] btrfs/27578 is trying to acquire lock: [ 1022.279372] ffff888107385ec8 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x25b/0x11e0 [ 1022.280228] but task is already holding lock: [ 1022.280786] ffff888107385ec8 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x25b/0x11e0 [ 1022.281722] other info that might help us debug this: [ 1022.282405] Possible unsafe locking scenario: [ 1022.282863] CPU0 [ 1022.283087] ---- [ 1022.283311] lock(&fs_info->dev_replace.rwsem); [ 1022.283676] lock(&fs_info->dev_replace.rwsem); [ 1022.284036] *** DEADLOCK *** [ 1022.284526] May be due to missing lock nesting notation [ 1022.285077] 1 lock held by btrfs/27578: [ 1022.285394] #0: ffff888107385ec8 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x25b/0x11e0 [ 1022.286109] stack backtrace: [ 1022.286482] CPU: 3 PID: 27578 Comm: btrfs Not tainted 6.9.0-rc7-default+ #417 [ 1022.287008] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [ 1022.287924] Call Trace: [ 1022.288189] <TASK> [ 1022.288433] dump_stack_lvl+0x61/0x80 [ 1022.288799] validate_chain+0x6ef/0xcc0 [ 1022.289179] ? check_prev_add+0x12f0/0x12f0 [ 1022.289581] ? mark_lock+0xe3/0xa10 [ 1022.289936] __lock_acquire+0x8cf/0xda0 [ 1022.290324] lock_acquire+0x149/0x3e0 [ 1022.290673] ? btrfs_map_block+0x25b/0x11e0 [ 1022.291086] ? lock_sync+0xd0/0xd0 [ 1022.291432] ? lock_is_held_type+0x83/0xf0 [ 1022.291831] ? __might_resched+0x162/0x240 [ 1022.292232] ? rcu_read_unlock+0x80/0x80 [ 1022.292617] down_read+0x9a/0x370 [ 1022.292959] ? btrfs_map_block+0x25b/0x11e0 [ 1022.293419] ? rwsem_down_read_slowpath+0x740/0x740 [ 1022.293876] ? do_raw_read_unlock+0x23/0x50 [ 1022.294278] ? _raw_read_unlock+0x1a/0x30 [ 1022.294667] btrfs_map_block+0x25b/0x11e0 [ 1022.295057] ? btrfs_map_discard+0x4e0/0x4e0 [ 1022.295478] ? btrfs_bio_counter_sub+0x70/0x70 [ 1022.295912] btrfs_submit_chunk+0x1a2/0x950 [ 1022.296313] ? btrfs_bio_init+0x80/0x80 [ 1022.296691] ? bio_check_pages_dirty+0x390/0x390 [ 1022.297159] ? do_raw_spin_unlock+0x8d/0xe0 [ 1022.297652] ? alloc_test_extent_buffer+0x2c0/0x2c0 [ 1022.298168] btrfs_submit_bio+0x1c/0x40 [ 1022.298627] read_extent_buffer_pages+0x26e/0x3b0 [ 1022.299068] btrfs_read_extent_buffer+0xe1/0x300 [ 1022.299605] read_tree_block+0x55/0x80 [ 1022.300039] read_block_for_search+0x3ca/0x520 [ 1022.300500] ? btrfs_release_path+0x1e0/0x1e0 [ 1022.300937] ? lock_is_held_type+0x83/0xf0 [ 1022.301414] ? btrfs_search_slot_get_root+0x280/0x520 [ 1022.301961] ? __might_sleep+0x28/0xd0 [ 1022.302360] btrfs_search_slot+0x4a3/0x13b0 [ 1022.302801] ? balance_level+0x15c0/0x15c0 [ 1022.303195] ? rcu_is_watching+0x1c/0x40 [ 1022.303648] ? kmem_cache_alloc+0x262/0x2f0 [ 1022.304076] ? find_held_lock+0x8a/0xa0 [ 1022.304543] btrfs_get_raid_extent_offset+0x14e/0x600 [ 1022.305091] ? btrfs_insert_raid_extent+0x160/0x160 [ 1022.305617] ? down_read+0xb7/0x370 [ 1022.306027] ? down_read+0x2e/0x370 [ 1022.306446] ? set_io_stripe+0x9d/0x140 [ 1022.306924] btrfs_map_block+0x3e7/0x11e0 [ 1022.307404] ? btrfs_map_discard+0x4e0/0x4e0 [ 1022.307917] ? scrub_write_endio+0x1c0/0x1c0 [ 1022.308422] ? btrfs_bio_init+0x67/0x80 [ 1022.308842] scrub_submit_extent_sector_read.isra.0+0x308/0x630 [ 1022.309434] ? scrub_stripe_submit_repair_read+0x440/0x440 [ 1022.310062] ? _find_next_bit+0x37/0xc0 [ 1022.310483] ? scrub_submit_initial_read+0x139/0x3e0 [ 1022.311040] submit_initial_group_read+0x175/0x330 [ 1022.311617] ? reacquire_held_locks+0x280/0x280 [ 1022.312118] ? scrub_submit_initial_read+0x3e0/0x3e0 [ 1022.312725] ? scrub_simple_mirror.isra.0+0x1e9/0x380 [ 1022.313292] scrub_simple_mirror.isra.0+0x2e7/0x380 [ 1022.313808] scrub_stripe+0x76d/0x7d0 [ 1022.314255] ? __lock_acquired+0x1ee/0x400 [ 1022.314740] ? scrub_setup_ctx+0x210/0x210 [ 1022.315212] ? btrfs_find_chunk_map+0x37/0x50 [ 1022.315696] ? do_raw_read_unlock+0x23/0x50 [ 1022.316228] scrub_chunk+0x137/0x200 [ 1022.316690] scrub_enumerate_chunks+0x539/0x960 [ 1022.317169] ? scrub_chunk+0x200/0x200 [ 1022.317608] ? bit_wait_io_timeout+0x10/0xc0 [ 1022.318101] ? __up_read+0x175/0x4c0 [ 1022.318558] ? _down_write_nest_lock+0x190/0x1c0 [ 1022.319067] ? btrfs_dev_name+0xb0/0xb0 [ 1022.319514] ? btrfs_scrub_dev+0x313/0x820 [ 1022.320034] btrfs_scrub_dev+0x3e4/0x820 [ 1022.320518] ? cleanup_transaction+0x100/0x100 [ 1022.321043] ? scrub_print_warning_inode+0x540/0x540 [ 1022.321613] btrfs_dev_replace_start+0x41f/0x550 [ 1022.322150] ? btrfs_dev_replace_finishing+0xa60/0xa60 [ 1022.322735] ? do_raw_spin_trylock+0xc7/0x110 [ 1022.323205] ? do_raw_spin_lock+0x1a0/0x1a0 [ 1022.323581] btrfs_dev_replace_by_ioctl+0x8e/0xf0 [ 1022.323951] btrfs_ioctl+0x4cf/0x17c0 [ 1022.324333] ? do_sigaction+0x234/0x420 [ 1022.324652] ? btrfs_ioctl_get_supported_features+0x20/0x20 [ 1022.325069] ? sigaction_compat_abi+0x10/0x10 [ 1022.325494] ? lock_release+0x9a/0x1a0 [ 1022.325837] ? __x64_sys_rt_sigaction+0x185/0x1b0 [ 1022.326203] ? __x64_sys_sigprocmask+0x230/0x230 [ 1022.326682] ? lockdep_hardirqs_on_prepare+0x13c/0x200 [ 1022.327094] __x64_sys_ioctl+0xbc/0xe0 [ 1022.327407] do_syscall_64+0x5b/0xf0 [ 1022.327745] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 1022.328217] RIP: 0033:0x7f756e53f3df [ 1022.330048] RSP: 002b:00007ffd174401c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1022.330774] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f756e53f3df [ 1022.331499] RDX: 00007ffd174410b0 RSI: 00000000ca289435 RDI: 0000000000000003 [ 1022.332176] RBP: 00007ffd17440280 R08: 0000000000000000 R09: 0000000000000007 [ 1022.332820] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffd17443f9e [ 1022.333452] R13: 0000555f1341a2a0 R14: 0000000000000000 R15: 0000000000000004 [ 1022.334082] </TASK>
On Wed, May 08, 2024 at 01:40:14PM +0200, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of > btrfs_map_block' sent as a series as I've added a 2nd patch, which > I've came accross while looking at the code. > > @Filipe, unfortunately I can't find the original report from the CI > anymore, so I don't have the stacktrace handy. And right after the stack trace I found this, it could be the same use-after-free that Filipe fixed, I haven't analyzed it: [ 1040.699694] BUG: KASAN: slab-use-after-free in btrfs_insert_one_raid_extent+0x1e2/0x430 [ 1040.700735] Read of size 8 at addr ffff888113385880 by task kworker/u32:12/13766 [ 1040.701948] CPU: 3 PID: 13766 Comm: kworker/u32:12 Not tainted 6.9.0-rc7-default+ #417 [ 1040.702834] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [ 1040.704167] Workqueue: btrfs-endio-write btrfs_work_helper [ 1040.704861] Call Trace: [ 1040.705242] <TASK> [ 1040.705581] dump_stack_lvl+0x61/0x80 [ 1040.706069] print_address_description.constprop.0+0x75/0x310 [ 1040.706652] ? btrfs_insert_one_raid_extent+0x1e2/0x430 [ 1040.707315] print_report+0x118/0x216 [ 1040.707837] ? btrfs_insert_one_raid_extent+0x1e2/0x430 [ 1040.708434] ? __virt_addr_valid+0x11e/0x2b0 [ 1040.708877] ? btrfs_insert_one_raid_extent+0x1e2/0x430 [ 1040.709507] kasan_report+0x11d/0x1f0 [ 1040.710013] ? btrfs_insert_one_raid_extent+0x1e2/0x430 [ 1040.710677] btrfs_insert_one_raid_extent+0x1e2/0x430 [ 1040.711313] ? record_root_in_trans+0x164/0x1a0 [ 1040.711781] ? btrfs_lru_cache_clear+0x80/0x80 [ 1040.712232] ? btrfs_record_root_in_trans+0xa8/0xc0 [ 1040.712716] ? start_transaction+0x154/0xa50 [ 1040.713154] btrfs_insert_raid_extent+0x9d/0x160 [ 1040.713681] btrfs_finish_one_ordered+0x5af/0xb60 [ 1040.714220] ? rcu_is_watching+0x1c/0x40 [ 1040.714712] ? lock_acquire+0xc1/0x3e0 [ 1040.715220] ? btrfs_unlink_subvol+0x650/0x650 [ 1040.715705] ? do_raw_spin_trylock+0xc7/0x110 [ 1040.716274] btrfs_work_helper+0x118/0x310 [ 1040.716822] ? rcu_is_watching+0x1c/0x40 [ 1040.717329] process_one_work+0x4be/0x9f0 [ 1040.717750] ? pwq_dec_nr_in_flight+0x250/0x250 [ 1040.718350] ? __list_add_valid_or_report+0x3f/0x70 [ 1040.718980] worker_thread+0x38d/0x680 [ 1040.720200] ? __kthread_parkme+0xc8/0xe0 [ 1040.720764] ? rescuer_thread+0x5e0/0x5e0 [ 1040.721307] kthread+0x171/0x1b0 [ 1040.723690] ? kthread+0xe5/0x1b0 [ 1040.724181] ? kthread_complete_and_exit+0x20/0x20 [ 1040.724782] ret_from_fork+0x28/0x50 [ 1040.726471] ? kthread_complete_and_exit+0x20/0x20 [ 1040.726930] ret_from_fork_asm+0x11/0x20 [ 1040.727415] </TASK> [ 1040.727863] Allocated by task 27528: [ 1040.728109] kasan_save_stack+0x37/0x60 [ 1040.728360] kasan_save_track+0x10/0x30 [ 1040.728602] __kasan_kmalloc+0x83/0x90 [ 1040.728845] btrfs_alloc_device+0xa4/0x2e0 [ 1040.729094] btrfs_init_dev_replace_tgtdev.constprop.0+0x1a4/0x4b0 [ 1040.729444] btrfs_dev_replace_start+0xdf/0x550 [ 1040.729713] btrfs_dev_replace_by_ioctl+0x8e/0xf0 [ 1040.730010] btrfs_ioctl+0x4cf/0x17c0 [ 1040.730292] __x64_sys_ioctl+0xbc/0xe0 [ 1040.730836] do_syscall_64+0x5b/0xf0 [ 1040.731352] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 1040.732308] Freed by task 27578: [ 1040.732780] kasan_save_stack+0x37/0x60 [ 1040.733458] kasan_save_track+0x10/0x30 [ 1040.733982] kasan_save_free_info+0x37/0x50 [ 1040.734454] poison_slab_object+0x19d/0x1e0 [ 1040.734908] __kasan_slab_free+0x10/0x30 [ 1040.735287] kfree+0xf8/0x2d0 [ 1040.735750] btrfs_rm_dev_replace_free_srcdev+0x7a/0x150 [ 1040.736377] btrfs_dev_replace_finishing+0x90d/0xa60 [ 1040.736983] btrfs_dev_replace_start+0x429/0x550 [ 1040.737552] btrfs_dev_replace_by_ioctl+0x8e/0xf0 [ 1040.738113] btrfs_ioctl+0x4cf/0x17c0 [ 1040.738601] __x64_sys_ioctl+0xbc/0xe0 [ 1040.739155] do_syscall_64+0x5b/0xf0 [ 1040.739759] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 1040.740712] The buggy address belongs to the object at ffff888113385800 which belongs to the cache kmalloc-1k of size 1024 [ 1040.741932] The buggy address is located 128 bytes inside of freed 1024-byte region [ffff888113385800, ffff888113385c00) [ 1040.743275] The buggy address belongs to the physical page: [ 1040.743997] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x113380 [ 1040.745033] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 1040.745867] flags: 0x4500000000840(slab|head|section=34|zone=2) [ 1040.746766] page_type: 0xffffffff() [ 1040.747241] raw: 0004500000000840 ffff8881000430c0 ffff88813f3c7210 ffff88813f304e10 [ 1040.747981] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000 [ 1040.748723] head: 0004500000000840 ffff8881000430c0 ffff88813f3c7210 ffff88813f304e10 [ 1040.749474] head: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000 [ 1040.750291] head: 0004500000000003 ffff88813f2ce001 dead000000000122 00000000ffffffff [ 1040.751227] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000 [ 1040.751961] page dumped because: kasan: bad access detected [ 1040.752735] Memory state around the buggy address: [ 1040.753213] ffff888113385780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1040.753960] ffff888113385800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1040.754660] >ffff888113385880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1040.755360] ^ [ 1040.755725] ffff888113385900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1040.756435] ffff888113385980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1040.757274] ================================================================== [ 1041.613686] BTRFS info (device vdd): scrub: started on devid 1 [ 1041.613740] BTRFS info (device vdd): scrub: started on devid 2 [ 1042.472852] assertion failed: i < stripe->nr_sectors, in fs/btrfs/scrub.c:753 [ 1042.473680] ------------[ cut here ]------------ [ 1042.474180] kernel BUG at fs/btrfs/scrub.c:753! [ 1042.474550] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 1042.474954] CPU: 0 PID: 27589 Comm: btrfs Tainted: G B 6.9.0-rc7-default+ #417 [ 1042.475640] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [ 1042.476821] RIP: 0010:calc_sector_number.cold+0x3d/0x5d [ 1042.477428] Code: a5 ff 8b 4c 24 08 e9 a9 8d 7c ff b9 f1 02 00 00 48 c7 c2 00 ac 38 a4 48 c7 c6 e0 b1 38 a4 48 c7 c7 c0 ac 38 a4 e8 0e 8d fb ff <0f> 0b 89 ca 48 89 ee 48 c7 c7 c0 1b ea a4 89 4c 24 08 e8 d7 52 a5 [ 1042.479196] RSP: 0018:ffff8881066cf5a0 EFLAGS: 00010286 [ 1042.479840] RAX: 0000000000000041 RBX: ffff8881033e89c0 RCX: 0000000000000000 [ 1042.480706] RDX: 0000000000000041 RSI: ffffffffa3137b3d RDI: ffffed1020cd9ea7 [ 1042.481511] RBP: 000000000000000f R08: 0000000000000000 R09: fffffbfff4957b9c [ 1042.482360] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88810654f080 [ 1042.483170] R13: 0000000000000010 R14: ffff88813f3c6ec0 R15: ffff8881033e8a72 [ 1042.483957] FS: 00007fd36967e6c0(0000) GS:ffff888119600000(0000) knlGS:0000000000000000 [ 1042.484860] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1042.485584] CR2: 00005586a760b560 CR3: 0000000100e82000 CR4: 00000000000006b0 [ 1042.486319] Call Trace: [ 1042.486620] <TASK> [ 1042.486940] ? show_trace_log_lvl+0x1e9/0x2e0 [ 1042.487501] ? scrub_read_endio+0x5a/0x290 [ 1042.488060] ? __die+0x4f/0x8c [ 1042.488513] ? die+0x25/0x40 [ 1042.488855] ? do_trap+0x125/0x180 [ 1042.489277] ? calc_sector_number.cold+0x3d/0x5d [ 1042.489878] ? calc_sector_number.cold+0x3d/0x5d [ 1042.490488] ? do_error_trap+0x8b/0x130 [ 1042.491002] ? calc_sector_number.cold+0x3d/0x5d [ 1042.491541] ? handle_invalid_op+0x38/0x40 [ 1042.492009] ? calc_sector_number.cold+0x3d/0x5d [ 1042.492542] ? exc_invalid_op+0x29/0x40 [ 1042.493035] ? asm_exc_invalid_op+0x16/0x20 [ 1042.493575] ? this_cpu_in_panic+0x1d/0x30 [ 1042.494034] ? calc_sector_number.cold+0x3d/0x5d [ 1042.494535] scrub_read_endio+0x5a/0x290 [ 1042.495058] scrub_submit_extent_sector_read.isra.0+0x60a/0x630 [ 1042.495695] ? scrub_stripe_submit_repair_read+0x440/0x440 [ 1042.496321] ? _find_next_bit+0x37/0xc0 [ 1042.496855] ? scrub_submit_initial_read+0x139/0x3e0 [ 1042.497367] submit_initial_group_read+0x175/0x330 [ 1042.497926] ? lock_sync+0xd0/0xd0 [ 1042.498329] ? scrub_submit_initial_read+0x3e0/0x3e0 [ 1042.498885] ? rcu_read_unlock+0x80/0x80 [ 1042.499383] scrub_simple_mirror.isra.0+0x2e7/0x380 [ 1042.499965] scrub_stripe+0x76d/0x7d0 [ 1042.500469] ? rcu_is_watching+0x1c/0x40 [ 1042.500898] ? rcu_is_watching+0x1c/0x40 [ 1042.501399] ? scrub_setup_ctx+0x210/0x210 [ 1042.501887] ? __down_write_trylock+0xf8/0x260 [ 1042.502428] ? down_read_trylock+0x50/0x50 [ 1042.502977] ? __might_resched+0x162/0x240 [ 1042.503462] ? rcu_read_unlock+0x80/0x80 [ 1042.503980] ? do_raw_read_unlock+0x23/0x50 [ 1042.504501] scrub_chunk+0x137/0x200 [ 1042.505023] scrub_enumerate_chunks+0x539/0x960 [ 1042.505594] ? scrub_chunk+0x200/0x200 [ 1042.506084] ? do_raw_spin_unlock+0x8d/0xe0 [ 1042.506562] ? bit_wait_io_timeout+0x10/0xc0 [ 1042.507087] ? do_raw_spin_trylock+0xc7/0x110 [ 1042.507635] btrfs_scrub_dev+0x3e4/0x820 [ 1042.508105] ? scrub_print_warning_inode+0x540/0x540 [ 1042.508666] ? mnt_get_write_access+0x6a/0x120 [ 1042.509177] ? mnt_get_write_access+0xfd/0x120 [ 1042.509788] btrfs_ioctl+0x79d/0x17c0 [ 1042.510236] ? btrfs_ioctl_get_supported_features+0x20/0x20 [ 1042.510852] ? lock_sync+0xd0/0xd0 [ 1042.511274] ? do_raw_spin_trylock+0xc7/0x110 [ 1042.511751] ? do_raw_spin_lock+0x1a0/0x1a0 [ 1042.512374] ? rcu_is_watching+0x1c/0x40 [ 1042.512957] ? rcu_is_watching+0x1c/0x40 [ 1042.513538] ? lock_release+0x52/0x1a0 [ 1042.514104] ? __fget_files+0x13b/0x230 [ 1042.514669] __x64_sys_ioctl+0xbc/0xe0 [ 1042.515235] do_syscall_64+0x5b/0xf0 [ 1042.515777] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 1042.516471] RIP: 0033:0x7fd369fa03df [ 1042.516968] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [ 1042.518755] RSP: 002b:00007fd36967dc80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1042.519784] RAX: ffffffffffffffda RBX: 00005627eeea1758 RCX: 00007fd369fa03df [ 1042.520548] RDX: 00005627eeea1758 RSI: 00000000c400941b RDI: 0000000000000003 [ 1042.521301] RBP: 0000000000000000 R08: 00007fd36967e6c0 R09: 32203a6b63617473 [ 1042.521944] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffdb8 [ 1042.522786] R13: 0000000000000002 R14: 00007ffecdb6f450 R15: 00007fd368e7e000 [ 1042.523695] </TASK> [ 1042.523965] Modules linked in: loop [ 1042.524367] ---[ end trace 0000000000000000 ]--- [ 1042.524856] RIP: 0010:calc_sector_number.cold+0x3d/0x5d [ 1042.525382] Code: a5 ff 8b 4c 24 08 e9 a9 8d 7c ff b9 f1 02 00 00 48 c7 c2 00 ac 38 a4 48 c7 c6 e0 b1 38 a4 48 c7 c7 c0 ac 38 a4 e8 0e 8d fb ff <0f> 0b 89 ca 48 89 ee 48 c7 c7 c0 1b ea a4 89 4c 24 08 e8 d7 52 a5 [ 1042.527082] RSP: 0018:ffff8881066cf5a0 EFLAGS: 00010286 [ 1042.527686] RAX: 0000000000000041 RBX: ffff8881033e89c0 RCX: 0000000000000000 [ 1042.528388] RDX: 0000000000000041 RSI: ffffffffa3137b3d RDI: ffffed1020cd9ea7 [ 1042.529109] RBP: 000000000000000f R08: 0000000000000000 R09: fffffbfff4957b9c [ 1042.529830] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88810654f080 [ 1042.530583] R13: 0000000000000010 R14: ffff88813f3c6ec0 R15: ffff8881033e8a72 [ 1042.531412] FS: 00007fd36967e6c0(0000) GS:ffff888119600000(0000) knlGS:0000000000000000 [ 1042.534519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1042.535162] CR2: 00005586a760b560 CR3: 0000000100e82000 CR4: 00000000000006b0
On 09.05.24 19:40, David Sterba wrote: > On Wed, May 08, 2024 at 01:40:14PM +0200, Johannes Thumshirn wrote: >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> >> This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of >> btrfs_map_block' sent as a series as I've added a 2nd patch, which >> I've came accross while looking at the code. >> >> @Filipe, unfortunately I can't find the original report from the CI >> anymore, so I don't have the stacktrace handy. > > I can reproduce it. Please edit it a bit for the of changelog. > Thanks a lot, added it to the changelog.
On 09.05.24 19:43, David Sterba wrote: > On Wed, May 08, 2024 at 01:40:14PM +0200, Johannes Thumshirn wrote: >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> >> This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of >> btrfs_map_block' sent as a series as I've added a 2nd patch, which >> I've came accross while looking at the code. >> >> @Filipe, unfortunately I can't find the original report from the CI >> anymore, so I don't have the stacktrace handy. > > And right after the stack trace I found this, it could be the same > use-after-free that Filipe fixed, I haven't analyzed it: From a first glance, this looks different but related. I /think/ we need to protect the btrfs_device in btrfs_insert_one_raid_extent with the dev_replace rwsem as well. I'll look into it. > [ 1040.699694] BUG: KASAN: slab-use-after-free in btrfs_insert_one_raid_extent+0x1e2/0x430 > [ 1040.700735] Read of size 8 at addr ffff888113385880 by task kworker/u32:12/13766 > > [ 1040.701948] CPU: 3 PID: 13766 Comm: kworker/u32:12 Not tainted 6.9.0-rc7-default+ #417 > [ 1040.702834] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 > [ 1040.704167] Workqueue: btrfs-endio-write btrfs_work_helper > [ 1040.704861] Call Trace: > [ 1040.705242] <TASK> > [ 1040.705581] dump_stack_lvl+0x61/0x80 > [ 1040.706069] print_address_description.constprop.0+0x75/0x310 > [ 1040.706652] ? btrfs_insert_one_raid_extent+0x1e2/0x430 > [ 1040.707315] print_report+0x118/0x216 > [ 1040.707837] ? btrfs_insert_one_raid_extent+0x1e2/0x430 > [ 1040.708434] ? __virt_addr_valid+0x11e/0x2b0 > [ 1040.708877] ? btrfs_insert_one_raid_extent+0x1e2/0x430 > [ 1040.709507] kasan_report+0x11d/0x1f0 > [ 1040.710013] ? btrfs_insert_one_raid_extent+0x1e2/0x430 > [ 1040.710677] btrfs_insert_one_raid_extent+0x1e2/0x430 > [ 1040.711313] ? record_root_in_trans+0x164/0x1a0 > [ 1040.711781] ? btrfs_lru_cache_clear+0x80/0x80 > [ 1040.712232] ? btrfs_record_root_in_trans+0xa8/0xc0 > [ 1040.712716] ? start_transaction+0x154/0xa50 > [ 1040.713154] btrfs_insert_raid_extent+0x9d/0x160 > [ 1040.713681] btrfs_finish_one_ordered+0x5af/0xb60 > [ 1040.714220] ? rcu_is_watching+0x1c/0x40 > [ 1040.714712] ? lock_acquire+0xc1/0x3e0 > [ 1040.715220] ? btrfs_unlink_subvol+0x650/0x650 > [ 1040.715705] ? do_raw_spin_trylock+0xc7/0x110 > [ 1040.716274] btrfs_work_helper+0x118/0x310 > [ 1040.716822] ? rcu_is_watching+0x1c/0x40 > [ 1040.717329] process_one_work+0x4be/0x9f0 > [ 1040.717750] ? pwq_dec_nr_in_flight+0x250/0x250 > [ 1040.718350] ? __list_add_valid_or_report+0x3f/0x70 > [ 1040.718980] worker_thread+0x38d/0x680 > [ 1040.720200] ? __kthread_parkme+0xc8/0xe0 > [ 1040.720764] ? rescuer_thread+0x5e0/0x5e0 > [ 1040.721307] kthread+0x171/0x1b0 > [ 1040.723690] ? kthread+0xe5/0x1b0 > [ 1040.724181] ? kthread_complete_and_exit+0x20/0x20 > [ 1040.724782] ret_from_fork+0x28/0x50 > [ 1040.726471] ? kthread_complete_and_exit+0x20/0x20 > [ 1040.726930] ret_from_fork_asm+0x11/0x20 > [ 1040.727415] </TASK> > > [ 1040.727863] Allocated by task 27528: > [ 1040.728109] kasan_save_stack+0x37/0x60 > [ 1040.728360] kasan_save_track+0x10/0x30 > [ 1040.728602] __kasan_kmalloc+0x83/0x90 > [ 1040.728845] btrfs_alloc_device+0xa4/0x2e0 > [ 1040.729094] btrfs_init_dev_replace_tgtdev.constprop.0+0x1a4/0x4b0 > [ 1040.729444] btrfs_dev_replace_start+0xdf/0x550 > [ 1040.729713] btrfs_dev_replace_by_ioctl+0x8e/0xf0 > [ 1040.730010] btrfs_ioctl+0x4cf/0x17c0 > [ 1040.730292] __x64_sys_ioctl+0xbc/0xe0 > [ 1040.730836] do_syscall_64+0x5b/0xf0 > [ 1040.731352] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > > [ 1040.732308] Freed by task 27578: > [ 1040.732780] kasan_save_stack+0x37/0x60 > [ 1040.733458] kasan_save_track+0x10/0x30 > [ 1040.733982] kasan_save_free_info+0x37/0x50 > [ 1040.734454] poison_slab_object+0x19d/0x1e0 > [ 1040.734908] __kasan_slab_free+0x10/0x30 > [ 1040.735287] kfree+0xf8/0x2d0 > [ 1040.735750] btrfs_rm_dev_replace_free_srcdev+0x7a/0x150 > [ 1040.736377] btrfs_dev_replace_finishing+0x90d/0xa60 > [ 1040.736983] btrfs_dev_replace_start+0x429/0x550 > [ 1040.737552] btrfs_dev_replace_by_ioctl+0x8e/0xf0 > [ 1040.738113] btrfs_ioctl+0x4cf/0x17c0 > [ 1040.738601] __x64_sys_ioctl+0xbc/0xe0 > [ 1040.739155] do_syscall_64+0x5b/0xf0 > [ 1040.739759] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > > [ 1040.740712] The buggy address belongs to the object at ffff888113385800 > which belongs to the cache kmalloc-1k of size 1024 > [ 1040.741932] The buggy address is located 128 bytes inside of > freed 1024-byte region [ffff888113385800, ffff888113385c00) > > [ 1040.743275] The buggy address belongs to the physical page: > [ 1040.743997] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x113380 > [ 1040.745033] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 > [ 1040.745867] flags: 0x4500000000840(slab|head|section=34|zone=2) > [ 1040.746766] page_type: 0xffffffff() > [ 1040.747241] raw: 0004500000000840 ffff8881000430c0 ffff88813f3c7210 ffff88813f304e10 > [ 1040.747981] raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000 > [ 1040.748723] head: 0004500000000840 ffff8881000430c0 ffff88813f3c7210 ffff88813f304e10 > [ 1040.749474] head: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000 > [ 1040.750291] head: 0004500000000003 ffff88813f2ce001 dead000000000122 00000000ffffffff > [ 1040.751227] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000 > [ 1040.751961] page dumped because: kasan: bad access detected > > [ 1040.752735] Memory state around the buggy address: > [ 1040.753213] ffff888113385780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > [ 1040.753960] ffff888113385800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > [ 1040.754660] >ffff888113385880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > [ 1040.755360] ^ > [ 1040.755725] ffff888113385900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > [ 1040.756435] ffff888113385980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > [ 1040.757274] ================================================================== > [ 1041.613686] BTRFS info (device vdd): scrub: started on devid 1 > [ 1041.613740] BTRFS info (device vdd): scrub: started on devid 2 > [ 1042.472852] assertion failed: i < stripe->nr_sectors, in fs/btrfs/scrub.c:753 > [ 1042.473680] ------------[ cut here ]------------ > [ 1042.474180] kernel BUG at fs/btrfs/scrub.c:753! > [ 1042.474550] invalid opcode: 0000 [#1] PREEMPT SMP KASAN > [ 1042.474954] CPU: 0 PID: 27589 Comm: btrfs Tainted: G B 6.9.0-rc7-default+ #417 > [ 1042.475640] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 > [ 1042.476821] RIP: 0010:calc_sector_number.cold+0x3d/0x5d > [ 1042.477428] Code: a5 ff 8b 4c 24 08 e9 a9 8d 7c ff b9 f1 02 00 00 48 c7 c2 00 ac 38 a4 48 c7 c6 e0 b1 38 a4 48 c7 c7 c0 ac 38 a4 e8 0e 8d fb ff <0f> 0b 89 ca 48 89 ee 48 c7 c7 c0 1b ea a4 89 4c 24 08 e8 d7 52 a5 > [ 1042.479196] RSP: 0018:ffff8881066cf5a0 EFLAGS: 00010286 > [ 1042.479840] RAX: 0000000000000041 RBX: ffff8881033e89c0 RCX: 0000000000000000 > [ 1042.480706] RDX: 0000000000000041 RSI: ffffffffa3137b3d RDI: ffffed1020cd9ea7 > [ 1042.481511] RBP: 000000000000000f R08: 0000000000000000 R09: fffffbfff4957b9c > [ 1042.482360] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88810654f080 > [ 1042.483170] R13: 0000000000000010 R14: ffff88813f3c6ec0 R15: ffff8881033e8a72 > [ 1042.483957] FS: 00007fd36967e6c0(0000) GS:ffff888119600000(0000) knlGS:0000000000000000 > [ 1042.484860] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1042.485584] CR2: 00005586a760b560 CR3: 0000000100e82000 CR4: 00000000000006b0 > [ 1042.486319] Call Trace: > [ 1042.486620] <TASK> > [ 1042.486940] ? show_trace_log_lvl+0x1e9/0x2e0 > [ 1042.487501] ? scrub_read_endio+0x5a/0x290 > [ 1042.488060] ? __die+0x4f/0x8c > [ 1042.488513] ? die+0x25/0x40 > [ 1042.488855] ? do_trap+0x125/0x180 > [ 1042.489277] ? calc_sector_number.cold+0x3d/0x5d > [ 1042.489878] ? calc_sector_number.cold+0x3d/0x5d > [ 1042.490488] ? do_error_trap+0x8b/0x130 > [ 1042.491002] ? calc_sector_number.cold+0x3d/0x5d > [ 1042.491541] ? handle_invalid_op+0x38/0x40 > [ 1042.492009] ? calc_sector_number.cold+0x3d/0x5d > [ 1042.492542] ? exc_invalid_op+0x29/0x40 > [ 1042.493035] ? asm_exc_invalid_op+0x16/0x20 > [ 1042.493575] ? this_cpu_in_panic+0x1d/0x30 > [ 1042.494034] ? calc_sector_number.cold+0x3d/0x5d > [ 1042.494535] scrub_read_endio+0x5a/0x290 > [ 1042.495058] scrub_submit_extent_sector_read.isra.0+0x60a/0x630 > [ 1042.495695] ? scrub_stripe_submit_repair_read+0x440/0x440 > [ 1042.496321] ? _find_next_bit+0x37/0xc0 > [ 1042.496855] ? scrub_submit_initial_read+0x139/0x3e0 > [ 1042.497367] submit_initial_group_read+0x175/0x330 > [ 1042.497926] ? lock_sync+0xd0/0xd0 > [ 1042.498329] ? scrub_submit_initial_read+0x3e0/0x3e0 > [ 1042.498885] ? rcu_read_unlock+0x80/0x80 > [ 1042.499383] scrub_simple_mirror.isra.0+0x2e7/0x380 > [ 1042.499965] scrub_stripe+0x76d/0x7d0 > [ 1042.500469] ? rcu_is_watching+0x1c/0x40 > [ 1042.500898] ? rcu_is_watching+0x1c/0x40 > [ 1042.501399] ? scrub_setup_ctx+0x210/0x210 > [ 1042.501887] ? __down_write_trylock+0xf8/0x260 > [ 1042.502428] ? down_read_trylock+0x50/0x50 > [ 1042.502977] ? __might_resched+0x162/0x240 > [ 1042.503462] ? rcu_read_unlock+0x80/0x80 > [ 1042.503980] ? do_raw_read_unlock+0x23/0x50 > [ 1042.504501] scrub_chunk+0x137/0x200 > [ 1042.505023] scrub_enumerate_chunks+0x539/0x960 > [ 1042.505594] ? scrub_chunk+0x200/0x200 > [ 1042.506084] ? do_raw_spin_unlock+0x8d/0xe0 > [ 1042.506562] ? bit_wait_io_timeout+0x10/0xc0 > [ 1042.507087] ? do_raw_spin_trylock+0xc7/0x110 > [ 1042.507635] btrfs_scrub_dev+0x3e4/0x820 > [ 1042.508105] ? scrub_print_warning_inode+0x540/0x540 > [ 1042.508666] ? mnt_get_write_access+0x6a/0x120 > [ 1042.509177] ? mnt_get_write_access+0xfd/0x120 > [ 1042.509788] btrfs_ioctl+0x79d/0x17c0 > [ 1042.510236] ? btrfs_ioctl_get_supported_features+0x20/0x20 > [ 1042.510852] ? lock_sync+0xd0/0xd0 > [ 1042.511274] ? do_raw_spin_trylock+0xc7/0x110 > [ 1042.511751] ? do_raw_spin_lock+0x1a0/0x1a0 > [ 1042.512374] ? rcu_is_watching+0x1c/0x40 > [ 1042.512957] ? rcu_is_watching+0x1c/0x40 > [ 1042.513538] ? lock_release+0x52/0x1a0 > [ 1042.514104] ? __fget_files+0x13b/0x230 > [ 1042.514669] __x64_sys_ioctl+0xbc/0xe0 > [ 1042.515235] do_syscall_64+0x5b/0xf0 > [ 1042.515777] entry_SYSCALL_64_after_hwframe+0x4b/0x53 > [ 1042.516471] RIP: 0033:0x7fd369fa03df > [ 1042.516968] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > [ 1042.518755] RSP: 002b:00007fd36967dc80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 1042.519784] RAX: ffffffffffffffda RBX: 00005627eeea1758 RCX: 00007fd369fa03df > [ 1042.520548] RDX: 00005627eeea1758 RSI: 00000000c400941b RDI: 0000000000000003 > [ 1042.521301] RBP: 0000000000000000 R08: 00007fd36967e6c0 R09: 32203a6b63617473 > [ 1042.521944] R10: 0000000000000000 R11: 0000000000000246 R12: fffffffffffffdb8 > [ 1042.522786] R13: 0000000000000002 R14: 00007ffecdb6f450 R15: 00007fd368e7e000 > [ 1042.523695] </TASK> > [ 1042.523965] Modules linked in: loop > [ 1042.524367] ---[ end trace 0000000000000000 ]--- > [ 1042.524856] RIP: 0010:calc_sector_number.cold+0x3d/0x5d > [ 1042.525382] Code: a5 ff 8b 4c 24 08 e9 a9 8d 7c ff b9 f1 02 00 00 48 c7 c2 00 ac 38 a4 48 c7 c6 e0 b1 38 a4 48 c7 c7 c0 ac 38 a4 e8 0e 8d fb ff <0f> 0b 89 ca 48 89 ee 48 c7 c7 c0 1b ea a4 89 4c 24 08 e8 d7 52 a5 > [ 1042.527082] RSP: 0018:ffff8881066cf5a0 EFLAGS: 00010286 > [ 1042.527686] RAX: 0000000000000041 RBX: ffff8881033e89c0 RCX: 0000000000000000 > [ 1042.528388] RDX: 0000000000000041 RSI: ffffffffa3137b3d RDI: ffffed1020cd9ea7 > [ 1042.529109] RBP: 000000000000000f R08: 0000000000000000 R09: fffffbfff4957b9c > [ 1042.529830] R10: 0000000000000003 R11: 0000000000000001 R12: ffff88810654f080 > [ 1042.530583] R13: 0000000000000010 R14: ffff88813f3c6ec0 R15: ffff8881033e8a72 > [ 1042.531412] FS: 00007fd36967e6c0(0000) GS:ffff888119600000(0000) knlGS:0000000000000000 > [ 1042.534519] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1042.535162] CR2: 00005586a760b560 CR3: 0000000100e82000 CR4: 00000000000006b0 >
From: Johannes Thumshirn <johannes.thumshirn@wdc.com> This is the v2 of 'btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block' sent as a series as I've added a 2nd patch, which I've came accross while looking at the code. @Filipe, unfortunately I can't find the original report from the CI anymore, so I don't have the stacktrace handy. Changes to RFC: - Incorporated Filipe's review - Added patch #2 Link to RFC: https://lore.kernel.org/linux-btrfs/2454cd4eb1694d37056e492af32b23743c63202b.1714663442.git.jth@kernel.org/ Johannes Thumshirn (2): btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block btrfs: pass 'struct btrfs_io_geometry' into handle_ops_on_dev_replace fs/btrfs/volumes.c | 52 ++++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 23 deletions(-)