Message ID | 49c34d4ede23d28d916eab4a22d4ec698f77f498.1707756893.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: fix deadlock with fiemap and extent locking | expand |
On Mon, Feb 12, 2024 at 11:56:02AM -0500, Josef Bacik wrote: > While working on the patchset to remove extent locking I got a lockdep > splat with fiemap and pagefaulting with my new extent lock replacement > lock. > > This deadlock exists with our normal code, we just don't have lockdep > annotations with the extent locking so we've never noticed it. > > Since we're copying the fiemap extent to user space on every iteration > we have the chance of pagefaulting. Because we hold the extent lock for > the entire range we could mkwrite into a range in the file that we have > mmap'ed. This would deadlock with the following stack trace > > [<0>] lock_extent+0x28d/0x2f0 > [<0>] btrfs_page_mkwrite+0x273/0x8a0 > [<0>] do_page_mkwrite+0x50/0xb0 > [<0>] do_fault+0xc1/0x7b0 > [<0>] __handle_mm_fault+0x2fa/0x460 > [<0>] handle_mm_fault+0xa4/0x330 > [<0>] do_user_addr_fault+0x1f4/0x800 > [<0>] exc_page_fault+0x7c/0x1e0 > [<0>] asm_exc_page_fault+0x26/0x30 > [<0>] rep_movs_alternative+0x33/0x70 > [<0>] _copy_to_user+0x49/0x70 > [<0>] fiemap_fill_next_extent+0xc8/0x120 > [<0>] emit_fiemap_extent+0x4d/0xa0 > [<0>] extent_fiemap+0x7f8/0xad0 > [<0>] btrfs_fiemap+0x49/0x80 > [<0>] __x64_sys_ioctl+0x3e1/0xb50 > [<0>] do_syscall_64+0x94/0x1a0 > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > I wrote an fstest to reproduce this deadlock without my replacement lock > and verified that the deadlock exists with our existing locking. > > To fix this simply don't take the extent lock for the entire duration of > the fiemap. This is safe in general because we keep track of where we > are when we're searching the tree, so if an ordered extent updates in > the middle of our fiemap call we'll still emit the correct extents > because we know what offset we were on before. > > The only place we maintain the lock is searching delalloc. Since the > delalloc stuff can change during writeback we want to lock the extent > range so we have a consistent view of delalloc at the time we're > checking to see if we need to set the delalloc flag. > > With this patch applied we no longer deadlock with my testcase. > > Reviewed-by: Filipe Manana <fdmanana@suse.com> > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > v1->v2: > - Fixed up the various formatting comments. > - Added a comment for the locking. Reviewed-by: David Sterba <dsterba@suse.com>
Hi, > While working on the patchset to remove extent locking I got a lockdep > splat with fiemap and pagefaulting with my new extent lock replacement > lock. > > This deadlock exists with our normal code, we just don't have lockdep > annotations with the extent locking so we've never noticed it. > > Since we're copying the fiemap extent to user space on every iteration > we have the chance of pagefaulting. Because we hold the extent lock for > the entire range we could mkwrite into a range in the file that we have > mmap'ed. This would deadlock with the following stack trace > > [<0>] lock_extent+0x28d/0x2f0 > [<0>] btrfs_page_mkwrite+0x273/0x8a0 > [<0>] do_page_mkwrite+0x50/0xb0 > [<0>] do_fault+0xc1/0x7b0 > [<0>] __handle_mm_fault+0x2fa/0x460 > [<0>] handle_mm_fault+0xa4/0x330 > [<0>] do_user_addr_fault+0x1f4/0x800 > [<0>] exc_page_fault+0x7c/0x1e0 > [<0>] asm_exc_page_fault+0x26/0x30 > [<0>] rep_movs_alternative+0x33/0x70 > [<0>] _copy_to_user+0x49/0x70 > [<0>] fiemap_fill_next_extent+0xc8/0x120 > [<0>] emit_fiemap_extent+0x4d/0xa0 > [<0>] extent_fiemap+0x7f8/0xad0 > [<0>] btrfs_fiemap+0x49/0x80 > [<0>] __x64_sys_ioctl+0x3e1/0xb50 > [<0>] do_syscall_64+0x94/0x1a0 > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > I wrote an fstest to reproduce this deadlock without my replacement lock > and verified that the deadlock exists with our existing locking. > > To fix this simply don't take the extent lock for the entire duration of > the fiemap. This is safe in general because we keep track of where we > are when we're searching the tree, so if an ordered extent updates in > the middle of our fiemap call we'll still emit the correct extents > because we know what offset we were on before. > > The only place we maintain the lock is searching delalloc. Since the > delalloc stuff can change during writeback we want to lock the extent > range so we have a consistent view of delalloc at the time we're > checking to see if we need to set the delalloc flag. > > With this patch applied we no longer deadlock with my testcase. After I applied this patch(upstream b0ad381fa7690244802aed119b478b4bdafc31dd ) into 6.6.18-rc1, fstests generic/561 will trigger a WARNING. reproduce frequency: about 1/5 Or we need some other patches too? [22281.809944] ------------[ cut here ]------------ [22281.816133] WARNING: CPU: 12 PID: 2995743 at fs/btrfs/extent_io.c:2457 emit_fiemap_extent+0x84/0x90 [btrfs] [22281.827552] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_dust dm_flakey ext4 mbcache jbd2 loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs nfnetlink overlay rfkill vfat fat drm_exec amdxcp drm_buddy gpu_sched intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm_intel snd_hda_intel radeon kvm snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core i2c_algo_bit snd_hwdep irqbypass drm_suballoc_helper snd_seq drm_display_helper rapl snd_seq_device cec intel_cstate snd_pcm mei_wdt drm_ttm_helper dcdbas snd_timer iTCO_wdt ttm mei_me dell_smm_hwmon iTCO_vendor_support intel_uncore mei snd video pcspkr i2c_i801 soundcore lpc_ich i2c_smbus fuse xfs sd_mod sg nvme_tcp nvme_fabrics nvme_core ahci libahci nvme_common mpt3sas t10_pi libata e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel raid_class [22281.827650] scsi_transport_sas wmi dm_multipath btrfs blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi i2c_dev [last unloaded: scsi_debug] [22281.958863] CPU: 12 PID: 2995743 Comm: pool Tainted: G W 6.6.18-0.1.el7.x86_64 #1 [22281.969394] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019 [22281.978405] RIP: 0010:emit_fiemap_extent+0x84/0x90 [btrfs] [22281.985631] Code: 2b 4c 89 63 08 4c 89 73 10 44 89 6b 18 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc 45 39 c1 75 cc 4c 01 f1 31 c0 48 89 4b 10 eb e3 <0f> 0b b8 ea ff ff ff eb da 0f 1f 00 90 90 90 90 90 90 90 90 90 90 [22282.007771] RSP: 0018:ffffc90026767c28 EFLAGS: 00010206 [22282.014681] RAX: 00000000001d1000 RBX: ffffc90026767d08 RCX: 0000000000005000 [22282.023469] RDX: 00000000001cd000 RSI: 00000000001cc000 RDI: ffffc90026767d90 [22282.032256] RBP: 00000000001cd000 R08: 0000000000008000 R09: 0000000000000000 [22282.041016] R10: 0000000000000000 R11: ffff88a0feeb20e0 R12: 0000000043ec2000 [22282.049759] R13: 0000000000000000 R14: 0000000000008000 R15: ffff88a10e2ca8c0 [22282.058636] FS: 00007fbedbfff640(0000) GS:ffff88b08f500000(0000) knlGS:0000000000000000 [22282.068287] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [22282.075624] CR2: 00007fbfb401a000 CR3: 0000000512642001 CR4: 00000000001706e0 [22282.084242] Call Trace: [22282.088280] <TASK> [22282.091954] ? __warn+0x85/0x140 [22282.096772] ? emit_fiemap_extent+0x84/0x90 [btrfs] [22282.103359] ? report_bug+0xfc/0x1e0 [22282.108505] ? handle_bug+0x3f/0x70 [22282.113545] ? exc_invalid_op+0x17/0x70 [22282.118902] ? asm_exc_invalid_op+0x1a/0x20 [22282.124639] ? emit_fiemap_extent+0x84/0x90 [btrfs] [22282.131140] extent_fiemap+0x7d9/0xae0 [btrfs] [22282.137198] btrfs_fiemap+0x49/0x80 [btrfs] [22282.143001] do_vfs_ioctl+0x177/0x900 [22282.148220] __x64_sys_ioctl+0x6e/0xd0 [22282.153502] do_syscall_64+0x5c/0x90 [22282.158616] ? __might_fault+0x26/0x30 [22282.163911] ? _copy_to_user+0x49/0x70 [22282.169209] ? do_vfs_ioctl+0x197/0x900 [22282.174579] ? __x64_sys_ioctl+0xab/0xd0 [22282.180035] ? syscall_exit_to_user_mode+0x26/0x40 [22282.186375] ? do_syscall_64+0x6b/0x90 [22282.191650] ? exc_page_fault+0x6b/0x150 [22282.197128] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [22282.203714] RIP: 0033:0x7fbfde23ec6b [22282.208882] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 [22282.231063] RSP: 002b:00007fbedbffeb98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [22282.240383] RAX: ffffffffffffffda RBX: 00007fbec4801378 RCX: 00007fbfde23ec6b [22282.249283] RDX: 00007fbec4801378 RSI: 00000000c020660b RDI: 0000000000000003 [22282.258180] RBP: 00007fbec4801370 R08: 00007fbedbffec90 R09: 00007fbedbffec7c [22282.266974] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fbedbffec7c [22282.275703] R13: 00007fbedbffec88 R14: 00007fbedbffec80 R15: 00007fbedbffec90 [22282.284443] </TASK> [22282.288165] ---[ end trace 0000000000000000 ]--- static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, if (cache->offset + cache->len > offset) { L2457: WARN_ON(1); return -EINVAL; } Best Regards Wang Yugui (wangyugui@e16-tech.com) 2024/02/23
On Fri, Feb 23, 2024 at 2:52 AM Wang Yugui <wangyugui@e16-tech.com> wrote: > > Hi, > > > While working on the patchset to remove extent locking I got a lockdep > > splat with fiemap and pagefaulting with my new extent lock replacement > > lock. > > > > This deadlock exists with our normal code, we just don't have lockdep > > annotations with the extent locking so we've never noticed it. > > > > Since we're copying the fiemap extent to user space on every iteration > > we have the chance of pagefaulting. Because we hold the extent lock for > > the entire range we could mkwrite into a range in the file that we have > > mmap'ed. This would deadlock with the following stack trace > > > > [<0>] lock_extent+0x28d/0x2f0 > > [<0>] btrfs_page_mkwrite+0x273/0x8a0 > > [<0>] do_page_mkwrite+0x50/0xb0 > > [<0>] do_fault+0xc1/0x7b0 > > [<0>] __handle_mm_fault+0x2fa/0x460 > > [<0>] handle_mm_fault+0xa4/0x330 > > [<0>] do_user_addr_fault+0x1f4/0x800 > > [<0>] exc_page_fault+0x7c/0x1e0 > > [<0>] asm_exc_page_fault+0x26/0x30 > > [<0>] rep_movs_alternative+0x33/0x70 > > [<0>] _copy_to_user+0x49/0x70 > > [<0>] fiemap_fill_next_extent+0xc8/0x120 > > [<0>] emit_fiemap_extent+0x4d/0xa0 > > [<0>] extent_fiemap+0x7f8/0xad0 > > [<0>] btrfs_fiemap+0x49/0x80 > > [<0>] __x64_sys_ioctl+0x3e1/0xb50 > > [<0>] do_syscall_64+0x94/0x1a0 > > [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > > I wrote an fstest to reproduce this deadlock without my replacement lock > > and verified that the deadlock exists with our existing locking. > > > > To fix this simply don't take the extent lock for the entire duration of > > the fiemap. This is safe in general because we keep track of where we > > are when we're searching the tree, so if an ordered extent updates in > > the middle of our fiemap call we'll still emit the correct extents > > because we know what offset we were on before. > > > > The only place we maintain the lock is searching delalloc. Since the > > delalloc stuff can change during writeback we want to lock the extent > > range so we have a consistent view of delalloc at the time we're > > checking to see if we need to set the delalloc flag. > > > > With this patch applied we no longer deadlock with my testcase. > > After I applied this patch(upstream b0ad381fa7690244802aed119b478b4bdafc31dd ) > into 6.6.18-rc1, fstests generic/561 will trigger a WARNING. > > reproduce frequency: about 1/5 > > Or we need some other patches too? > > [22281.809944] ------------[ cut here ]------------ > [22281.816133] WARNING: CPU: 12 PID: 2995743 at fs/btrfs/extent_io.c:2457 emit_fiemap_extent+0x84/0x90 [btrfs] > [22281.827552] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_dust dm_flakey ext4 mbcache jbd2 loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs nfnetlink overlay rfkill vfat fat drm_exec amdxcp drm_buddy gpu_sched intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm_intel snd_hda_intel radeon kvm snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core i2c_algo_bit snd_hwdep irqbypass drm_suballoc_helper snd_seq drm_display_helper rapl snd_seq_device cec intel_cstate snd_pcm mei_wdt drm_ttm_helper dcdbas snd_timer iTCO_wdt ttm mei_me dell_smm_hwmon iTCO_vendor_support intel_uncore mei snd video pcspkr i2c_i801 soundcore lpc_ich i2c_smbus fuse xfs sd_mod sg nvme_tcp nvme_fabrics nvme_core ahci libahci nvme_common mpt3sas t10_pi libata e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel raid_class > [22281.827650] scsi_transport_sas wmi dm_multipath btrfs blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi i2c_dev [last unloaded: scsi_debug] > [22281.958863] CPU: 12 PID: 2995743 Comm: pool Tainted: G W 6.6.18-0.1.el7.x86_64 #1 > [22281.969394] Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019 > [22281.978405] RIP: 0010:emit_fiemap_extent+0x84/0x90 [btrfs] > [22281.985631] Code: 2b 4c 89 63 08 4c 89 73 10 44 89 6b 18 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc 45 39 c1 75 cc 4c 01 f1 31 c0 48 89 4b 10 eb e3 <0f> 0b b8 ea ff ff ff eb da 0f 1f 00 90 90 90 90 90 90 90 90 90 90 > [22282.007771] RSP: 0018:ffffc90026767c28 EFLAGS: 00010206 > [22282.014681] RAX: 00000000001d1000 RBX: ffffc90026767d08 RCX: 0000000000005000 > [22282.023469] RDX: 00000000001cd000 RSI: 00000000001cc000 RDI: ffffc90026767d90 > [22282.032256] RBP: 00000000001cd000 R08: 0000000000008000 R09: 0000000000000000 > [22282.041016] R10: 0000000000000000 R11: ffff88a0feeb20e0 R12: 0000000043ec2000 > [22282.049759] R13: 0000000000000000 R14: 0000000000008000 R15: ffff88a10e2ca8c0 > [22282.058636] FS: 00007fbedbfff640(0000) GS:ffff88b08f500000(0000) knlGS:0000000000000000 > [22282.068287] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [22282.075624] CR2: 00007fbfb401a000 CR3: 0000000512642001 CR4: 00000000001706e0 > [22282.084242] Call Trace: > [22282.088280] <TASK> > [22282.091954] ? __warn+0x85/0x140 > [22282.096772] ? emit_fiemap_extent+0x84/0x90 [btrfs] > [22282.103359] ? report_bug+0xfc/0x1e0 > [22282.108505] ? handle_bug+0x3f/0x70 > [22282.113545] ? exc_invalid_op+0x17/0x70 > [22282.118902] ? asm_exc_invalid_op+0x1a/0x20 > [22282.124639] ? emit_fiemap_extent+0x84/0x90 [btrfs] > [22282.131140] extent_fiemap+0x7d9/0xae0 [btrfs] > [22282.137198] btrfs_fiemap+0x49/0x80 [btrfs] > [22282.143001] do_vfs_ioctl+0x177/0x900 > [22282.148220] __x64_sys_ioctl+0x6e/0xd0 > [22282.153502] do_syscall_64+0x5c/0x90 > [22282.158616] ? __might_fault+0x26/0x30 > [22282.163911] ? _copy_to_user+0x49/0x70 > [22282.169209] ? do_vfs_ioctl+0x197/0x900 > [22282.174579] ? __x64_sys_ioctl+0xab/0xd0 > [22282.180035] ? syscall_exit_to_user_mode+0x26/0x40 > [22282.186375] ? do_syscall_64+0x6b/0x90 > [22282.191650] ? exc_page_fault+0x6b/0x150 > [22282.197128] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > [22282.203714] RIP: 0033:0x7fbfde23ec6b > [22282.208882] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48 > [22282.231063] RSP: 002b:00007fbedbffeb98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [22282.240383] RAX: ffffffffffffffda RBX: 00007fbec4801378 RCX: 00007fbfde23ec6b > [22282.249283] RDX: 00007fbec4801378 RSI: 00000000c020660b RDI: 0000000000000003 > [22282.258180] RBP: 00007fbec4801370 R08: 00007fbedbffec90 R09: 00007fbedbffec7c > [22282.266974] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fbedbffec7c > [22282.275703] R13: 00007fbedbffec88 R14: 00007fbedbffec80 R15: 00007fbedbffec90 > [22282.284443] </TASK> > [22282.288165] ---[ end trace 0000000000000000 ]--- > > static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo, > if (cache->offset + cache->len > offset) { > L2457: WARN_ON(1); > return -EINVAL; > } Yes, this patch triggers a race. I was just working on a fix for it yesterday. I've just sent it to the list, it's the first patch in this series: https://lore.kernel.org/linux-btrfs/cover.1708635463.git.fdmanana@suse.com/ Thanks. > > Best Regards > Wang Yugui (wangyugui@e16-tech.com) > 2024/02/23 > > >
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 8648ea9b5fb5..dfcde1610b0a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2683,16 +2683,34 @@ static int fiemap_process_hole(struct btrfs_inode *inode, * it beyond i_size. */ while (cur_offset < end && cur_offset < i_size) { + struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; u64 prealloc_start; + u64 lockstart; + u64 lockend; u64 prealloc_len = 0; bool delalloc; + lockstart = round_down(cur_offset, inode->root->fs_info->sectorsize); + lockend = round_up(end, inode->root->fs_info->sectorsize); + + /* + * We are only locking for the delalloc range because that's the + * only thing that can change here. With fiemap we have a lock + * on the inode, so no buffered or direct writes can happen. + * + * However mmaps and normal page writeback will cause this to + * change arbitrarily. We have to lock the extent lock here to + * make sure that nobody messes with the tree while we're doing + * btrfs_find_delalloc_in_range. + */ + lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, cur_offset, end, delalloc_cached_state, &delalloc_start, &delalloc_end); + unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) break; @@ -2860,15 +2878,15 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { const u64 ino = btrfs_ino(inode); - struct extent_state *cached_state = NULL; struct extent_state *delalloc_cached_state = NULL; struct btrfs_path *path; struct fiemap_cache cache = { 0 }; struct btrfs_backref_share_check_ctx *backref_ctx; u64 last_extent_end; u64 prev_extent_end; - u64 lockstart; - u64 lockend; + u64 range_start; + u64 range_end; + const u64 sectorsize = inode->root->fs_info->sectorsize; bool stopped = false; int ret; @@ -2879,12 +2897,11 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, goto out; } - lockstart = round_down(start, inode->root->fs_info->sectorsize); - lockend = round_up(start + len, inode->root->fs_info->sectorsize); - prev_extent_end = lockstart; + range_start = round_down(start, sectorsize); + range_end = round_up(start + len, sectorsize); + prev_extent_end = range_start; btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED); - lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end); if (ret < 0) @@ -2892,7 +2909,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, btrfs_release_path(path); path->reada = READA_FORWARD; - ret = fiemap_search_slot(inode, path, lockstart); + ret = fiemap_search_slot(inode, path, range_start); if (ret < 0) { goto out_unlock; } else if (ret > 0) { @@ -2904,7 +2921,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, goto check_eof_delalloc; } - while (prev_extent_end < lockend) { + while (prev_extent_end < range_end) { struct extent_buffer *leaf = path->nodes[0]; struct btrfs_file_extent_item *ei; struct btrfs_key key; @@ -2927,19 +2944,19 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, * The first iteration can leave us at an extent item that ends * before our range's start. Move to the next item. */ - if (extent_end <= lockstart) + if (extent_end <= range_start) goto next_item; backref_ctx->curr_leaf_bytenr = leaf->start; /* We have in implicit hole (NO_HOLES feature enabled). */ if (prev_extent_end < key.offset) { - const u64 range_end = min(key.offset, lockend) - 1; + const u64 hole_end = min(key.offset, range_end) - 1; ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx, 0, 0, 0, - prev_extent_end, range_end); + prev_extent_end, hole_end); if (ret < 0) { goto out_unlock; } else if (ret > 0) { @@ -2949,7 +2966,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, } /* We've reached the end of the fiemap range, stop. */ - if (key.offset >= lockend) { + if (key.offset >= range_end) { stopped = true; break; } @@ -3043,29 +3060,41 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, btrfs_free_path(path); path = NULL; - if (!stopped && prev_extent_end < lockend) { + if (!stopped && prev_extent_end < range_end) { ret = fiemap_process_hole(inode, fieinfo, &cache, &delalloc_cached_state, backref_ctx, - 0, 0, 0, prev_extent_end, lockend - 1); + 0, 0, 0, prev_extent_end, range_end - 1); if (ret < 0) goto out_unlock; - prev_extent_end = lockend; + prev_extent_end = range_end; } if (cache.cached && cache.offset + cache.len >= last_extent_end) { const u64 i_size = i_size_read(&inode->vfs_inode); if (prev_extent_end < i_size) { + struct extent_state *cached_state = NULL; u64 delalloc_start; u64 delalloc_end; + u64 lockstart; + u64 lockend; bool delalloc; + lockstart = round_down(prev_extent_end, sectorsize); + lockend = round_up(i_size, sectorsize); + + /* + * See the comment in fiemap_process_hole as to why + * we're doing the locking here. + */ + lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); delalloc = btrfs_find_delalloc_in_range(inode, prev_extent_end, i_size - 1, &delalloc_cached_state, &delalloc_start, &delalloc_end); + unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (!delalloc) cache.flags |= FIEMAP_EXTENT_LAST; } else { @@ -3076,7 +3105,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = emit_last_fiemap_cache(fieinfo, &cache); out_unlock: - unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); out: free_extent_state(delalloc_cached_state);