diff mbox series

btrfs: zoned: fix extent unlock in cow_file_range()

Message ID baa48c5a32ae079b218613cbdae175f2387cd745.1739948529.git.naohiro.aota@wdc.com (mailing list archive)
State New
Headers show
Series btrfs: zoned: fix extent unlock in cow_file_range() | expand

Commit Message

Naohiro Aota Feb. 19, 2025, 7:02 a.m. UTC
Running generic/751 on the btrfs for-next often results in hung like
below. They are both stack by locking an extent. This suggests someone
forget to unlock an extent.

    INFO: task kworker/u128:1:12 blocked for more than 323 seconds.
          Not tainted 6.13.0-BTRFS-ZNS+ #503
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:kworker/u128:1  state:D stack:0     pid:12    tgid:12    ppid:2      flags:0x00004000
    Workqueue: btrfs-fixup btrfs_work_helper [btrfs]
    Call Trace:
     <TASK>
     __schedule+0x534/0xdd0
     schedule+0x39/0x140
     __lock_extent+0x31b/0x380 [btrfs]
     ? __pfx_autoremove_wake_function+0x10/0x10
     btrfs_writepage_fixup_worker+0xf1/0x3a0 [btrfs]
     btrfs_work_helper+0xff/0x480 [btrfs]
     ? lock_release+0x178/0x2c0
     process_one_work+0x1ee/0x570
     ? srso_return_thunk+0x5/0x5f
     worker_thread+0x1d1/0x3b0
     ? __pfx_worker_thread+0x10/0x10
     kthread+0x10b/0x230
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0x30/0x50
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1a/0x30
     </TASK>
    INFO: task kworker/u134:0:184 blocked for more than 323 seconds.
          Not tainted 6.13.0-BTRFS-ZNS+ #503
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:kworker/u134:0  state:D stack:0     pid:184   tgid:184   ppid:2      flags:0x00004000
    Workqueue: writeback wb_workfn (flush-btrfs-4)
    Call Trace:
     <TASK>
     __schedule+0x534/0xdd0
     schedule+0x39/0x140
     __lock_extent+0x31b/0x380 [btrfs]
     ? __pfx_autoremove_wake_function+0x10/0x10
     find_lock_delalloc_range+0xdb/0x260 [btrfs]
     writepage_delalloc+0x12f/0x500 [btrfs]
     ? srso_return_thunk+0x5/0x5f
     extent_write_cache_pages+0x232/0x840 [btrfs]
     btrfs_writepages+0x72/0x130 [btrfs]
     do_writepages+0xe7/0x260
     ? srso_return_thunk+0x5/0x5f
     ? lock_acquire+0xd2/0x300
     ? srso_return_thunk+0x5/0x5f
     ? find_held_lock+0x2b/0x80
     ? wbc_attach_and_unlock_inode.part.0+0x102/0x250
     ? wbc_attach_and_unlock_inode.part.0+0x102/0x250
     __writeback_single_inode+0x5c/0x4b0
     writeback_sb_inodes+0x22d/0x550
     __writeback_inodes_wb+0x4c/0xe0
     wb_writeback+0x2f6/0x3f0
     wb_workfn+0x32a/0x510
     process_one_work+0x1ee/0x570
     ? srso_return_thunk+0x5/0x5f
     worker_thread+0x1d1/0x3b0
     ? __pfx_worker_thread+0x10/0x10
     kthread+0x10b/0x230
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0x30/0x50
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1a/0x30
     </TASK>

This happens because we have another success path for the zoned mode. When
there is no active zone available, btrfs_reserve_extent() returns
-EAGAIN. In this case, we have two reactions. (1) If the given range is
never allocated, we can only wait for someone to finish a zone, so wait on
BTRFS_FS_NEED_ZONE_FINISH bit and retry afterward. (2) Or, if some
allocations are already done, we must bail out and let the caller to send
IOs for the allocation. This is because these IOs may be necessary to
finish a zone.

The commit 06f364284794 ("btrfs: do proper folio cleanup when
cow_file_range() failed") moved the unlock code from the inside of the loop
to the outside. So, previously, the allocated extents are unlocked just
after the allocation and so before returning from the function. However,
they are no longer unlocked on the case (2) above. That caused the hung
issue.

Fix the issue by modifying the 'end' to the end of the allocated
range. Then, we can exit the loop and the same unlock code can properly
handle the case.

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
CC: stable@vger.kernel.org
Fixes: 06f364284794 ("btrfs: do proper folio cleanup when cow_file_range() failed")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/inode.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Qu Wenruo Feb. 19, 2025, 7:16 a.m. UTC | #1
在 2025/2/19 17:32, Naohiro Aota 写道:
> Running generic/751 on the btrfs for-next often results in hung like
> below. They are both stack by locking an extent. This suggests someone
> forget to unlock an extent.
> 
>      INFO: task kworker/u128:1:12 blocked for more than 323 seconds.
>            Not tainted 6.13.0-BTRFS-ZNS+ #503
>      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>      task:kworker/u128:1  state:D stack:0     pid:12    tgid:12    ppid:2      flags:0x00004000
>      Workqueue: btrfs-fixup btrfs_work_helper [btrfs]
>      Call Trace:
>       <TASK>
>       __schedule+0x534/0xdd0
>       schedule+0x39/0x140
>       __lock_extent+0x31b/0x380 [btrfs]
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       btrfs_writepage_fixup_worker+0xf1/0x3a0 [btrfs]
>       btrfs_work_helper+0xff/0x480 [btrfs]
>       ? lock_release+0x178/0x2c0
>       process_one_work+0x1ee/0x570
>       ? srso_return_thunk+0x5/0x5f
>       worker_thread+0x1d1/0x3b0
>       ? __pfx_worker_thread+0x10/0x10
>       kthread+0x10b/0x230
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork+0x30/0x50
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork_asm+0x1a/0x30
>       </TASK>
>      INFO: task kworker/u134:0:184 blocked for more than 323 seconds.
>            Not tainted 6.13.0-BTRFS-ZNS+ #503
>      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>      task:kworker/u134:0  state:D stack:0     pid:184   tgid:184   ppid:2      flags:0x00004000
>      Workqueue: writeback wb_workfn (flush-btrfs-4)
>      Call Trace:
>       <TASK>
>       __schedule+0x534/0xdd0
>       schedule+0x39/0x140
>       __lock_extent+0x31b/0x380 [btrfs]
>       ? __pfx_autoremove_wake_function+0x10/0x10
>       find_lock_delalloc_range+0xdb/0x260 [btrfs]
>       writepage_delalloc+0x12f/0x500 [btrfs]
>       ? srso_return_thunk+0x5/0x5f
>       extent_write_cache_pages+0x232/0x840 [btrfs]
>       btrfs_writepages+0x72/0x130 [btrfs]
>       do_writepages+0xe7/0x260
>       ? srso_return_thunk+0x5/0x5f
>       ? lock_acquire+0xd2/0x300
>       ? srso_return_thunk+0x5/0x5f
>       ? find_held_lock+0x2b/0x80
>       ? wbc_attach_and_unlock_inode.part.0+0x102/0x250
>       ? wbc_attach_and_unlock_inode.part.0+0x102/0x250
>       __writeback_single_inode+0x5c/0x4b0
>       writeback_sb_inodes+0x22d/0x550
>       __writeback_inodes_wb+0x4c/0xe0
>       wb_writeback+0x2f6/0x3f0
>       wb_workfn+0x32a/0x510
>       process_one_work+0x1ee/0x570
>       ? srso_return_thunk+0x5/0x5f
>       worker_thread+0x1d1/0x3b0
>       ? __pfx_worker_thread+0x10/0x10
>       kthread+0x10b/0x230
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork+0x30/0x50
>       ? __pfx_kthread+0x10/0x10
>       ret_from_fork_asm+0x1a/0x30
>       </TASK>
> 
> This happens because we have another success path for the zoned mode. When
> there is no active zone available, btrfs_reserve_extent() returns
> -EAGAIN. In this case, we have two reactions. (1) If the given range is
> never allocated, we can only wait for someone to finish a zone, so wait on
> BTRFS_FS_NEED_ZONE_FINISH bit and retry afterward. (2) Or, if some
> allocations are already done, we must bail out and let the caller to send
> IOs for the allocation. This is because these IOs may be necessary to
> finish a zone.
> 
> The commit 06f364284794 ("btrfs: do proper folio cleanup when
> cow_file_range() failed") moved the unlock code from the inside of the loop
> to the outside. So, previously, the allocated extents are unlocked just
> after the allocation and so before returning from the function. However,
> they are no longer unlocked on the case (2) above. That caused the hung
> issue.
> 
> Fix the issue by modifying the 'end' to the end of the allocated
> range. Then, we can exit the loop and the same unlock code can properly
> handle the case.
> 
> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> CC: stable@vger.kernel.org
> Fixes: 06f364284794 ("btrfs: do proper folio cleanup when cow_file_range() failed")

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks for catching this, at the time of crafting that series, I had a 
bad feeling that there would be some bugs related to zoned support, and 
unfortunately this turns out to be true...

Thanks,
Qu

> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   fs/btrfs/inode.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 1512eb94b6e5..f80db81fc853 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1378,8 +1378,14 @@ static noinline int cow_file_range(struct btrfs_inode *inode,
>   				continue;
>   			}
>   			if (done_offset) {
> -				*done_offset = start - 1;
> -				return 0;
> +				/*
> +				 * Move @end to the end of the processed range,
> +				 * and exit the loop to unlock the processed
> +				 * extents.
> +				 */
> +				end = start - 1;
> +				ret = 0;
> +				break;
>   			}
>   			ret = -ENOSPC;
>   		}
Johannes Thumshirn Feb. 19, 2025, 7:36 a.m. UTC | #2
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1512eb94b6e5..f80db81fc853 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1378,8 +1378,14 @@  static noinline int cow_file_range(struct btrfs_inode *inode,
 				continue;
 			}
 			if (done_offset) {
-				*done_offset = start - 1;
-				return 0;
+				/*
+				 * Move @end to the end of the processed range,
+				 * and exit the loop to unlock the processed
+				 * extents.
+				 */
+				end = start - 1;
+				ret = 0;
+				break;
 			}
 			ret = -ENOSPC;
 		}