diff mbox series

[v2,2/5] btrfs: avoid page_lockend underflow in btrfs_punch_hole_lock_range()

Message ID 21ee2b756ce8ad1dcf1b9ecdfec84f0b87c271f5.1743239672.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs: add the missing preparations exposed by initial large data folio support | expand

Commit Message

Qu Wenruo March 29, 2025, 9:19 a.m. UTC
[BUG]
When running btrfs/004 with 4K fs block size and 64K page size,
sometimes fsstress workload can take 100% CPU for a while, but not long
enough to trigger a 120s hang warning.

[CAUSE]
When such 100% CPU usage happens, btrfs_punch_hole_lock_range() is
always in the call trace.

With extra ftrace debugs, it shows something like this:

   btrfs_punch_hole_lock_range: r/i=5/2745 start=4096(65536)
end=20479(18446744073709551615) enter

Where 4096 is the @lockstart parameter, 65536 is the rounded up
@page_lockstart, 20479 is the @lockend parameter.
So far so good.

But the large number (u64)-1 is the @page_lockend, which is not correct.

This is caused by the fact that round_down(locked + 1, PAGE_SIZE)
results 0.

In the above case, the range is inside the same page, and we do not even
need to call filemap_range_has_page(), not to mention to call it with
(u64)-1 as the end.

This behavior will cause btrfs_punch_hole_lock_range() to busy loop
waiting for irrelevant range to has its pages to be dropped.

[FIX]
Calculate @page_lockend by just rounding down @lockend, without
decreasing the value by one.
So @page_lockend will no longer overflow.

Then exit early if @page_lockend is no larger than @page_lockestart.
As it means either the range is inside the same page, or the two pages
are adjacent already.

Finally only decrease @page_lockend when calling
filemap_range_has_page().

Fixes: 0528476b6ac7 ("btrfs: fix the filemap_range_has_page() call in btrfs_punch_hole_lock_range()")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/file.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Filipe Manana March 31, 2025, 11:24 a.m. UTC | #1
On Sat, Mar 29, 2025 at 9:20 AM Qu Wenruo <wqu@suse.com> wrote:
>
> [BUG]
> When running btrfs/004 with 4K fs block size and 64K page size,
> sometimes fsstress workload can take 100% CPU for a while, but not long
> enough to trigger a 120s hang warning.
>
> [CAUSE]
> When such 100% CPU usage happens, btrfs_punch_hole_lock_range() is
> always in the call trace.
>
> With extra ftrace debugs, it shows something like this:
>
>    btrfs_punch_hole_lock_range: r/i=5/2745 start=4096(65536)

What's this r/i=5/2745?

Since this is from a custom tracepoint you made for debugging, I
suggest removing that part from the changelog.
It's not obvious (to me at least) what it is and it's not relevant to
understand the problem being fixed.

> end=20479(18446744073709551615) enter
>
> Where 4096 is the @lockstart parameter, 65536 is the rounded up
> @page_lockstart, 20479 is the @lockend parameter.
> So far so good.
>
> But the large number (u64)-1 is the @page_lockend, which is not correct.
>
> This is caused by the fact that round_down(locked + 1, PAGE_SIZE)
> results 0.
>
> In the above case, the range is inside the same page, and we do not even
> need to call filemap_range_has_page(), not to mention to call it with
> (u64)-1 as the end.
>
> This behavior will cause btrfs_punch_hole_lock_range() to busy loop
> waiting for irrelevant range to has its pages to be dropped.
>
> [FIX]
> Calculate @page_lockend by just rounding down @lockend, without
> decreasing the value by one.
> So @page_lockend will no longer overflow.
>
> Then exit early if @page_lockend is no larger than @page_lockestart.

@page_lockestart -> @page_lockstart


> As it means either the range is inside the same page, or the two pages
> are adjacent already.
>
> Finally only decrease @page_lockend when calling
> filemap_range_has_page().
>
> Fixes: 0528476b6ac7 ("btrfs: fix the filemap_range_has_page() call in btrfs_punch_hole_lock_range()")
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

With those minor things fixed up, it looks good, thanks.

> ---
>  fs/btrfs/file.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 589d36f8de12..7c147ef9368d 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2126,15 +2126,20 @@ static void btrfs_punch_hole_lock_range(struct inode *inode,
>          * will always return true.
>          * So here we need to do extra page alignment for
>          * filemap_range_has_page().
> +        *
> +        * And do not decrease page_lockend right now, as it can be 0.
>          */
>         const u64 page_lockstart = round_up(lockstart, PAGE_SIZE);
> -       const u64 page_lockend = round_down(lockend + 1, PAGE_SIZE) - 1;
> +       const u64 page_lockend = round_down(lockend + 1, PAGE_SIZE);
>
>         while (1) {
>                 truncate_pagecache_range(inode, lockstart, lockend);
>
>                 lock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend,
>                             cached_state);
> +               /* The same page or adjacent pages. */
> +               if (page_lockend <= page_lockstart)
> +                       break;
>                 /*
>                  * We can't have ordered extents in the range, nor dirty/writeback
>                  * pages, because we have locked the inode's VFS lock in exclusive
> @@ -2146,7 +2151,7 @@ static void btrfs_punch_hole_lock_range(struct inode *inode,
>                  * we do, unlock the range and retry.
>                  */
>                 if (!filemap_range_has_page(inode->i_mapping, page_lockstart,
> -                                           page_lockend))
> +                                           page_lockend - 1))
>                         break;
>
>                 unlock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend,
> --
> 2.49.0
>
>
diff mbox series

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 589d36f8de12..7c147ef9368d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2126,15 +2126,20 @@  static void btrfs_punch_hole_lock_range(struct inode *inode,
 	 * will always return true.
 	 * So here we need to do extra page alignment for
 	 * filemap_range_has_page().
+	 *
+	 * And do not decrease page_lockend right now, as it can be 0.
 	 */
 	const u64 page_lockstart = round_up(lockstart, PAGE_SIZE);
-	const u64 page_lockend = round_down(lockend + 1, PAGE_SIZE) - 1;
+	const u64 page_lockend = round_down(lockend + 1, PAGE_SIZE);
 
 	while (1) {
 		truncate_pagecache_range(inode, lockstart, lockend);
 
 		lock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend,
 			    cached_state);
+		/* The same page or adjacent pages. */
+		if (page_lockend <= page_lockstart)
+			break;
 		/*
 		 * We can't have ordered extents in the range, nor dirty/writeback
 		 * pages, because we have locked the inode's VFS lock in exclusive
@@ -2146,7 +2151,7 @@  static void btrfs_punch_hole_lock_range(struct inode *inode,
 		 * we do, unlock the range and retry.
 		 */
 		if (!filemap_range_has_page(inode->i_mapping, page_lockstart,
-					    page_lockend))
+					    page_lockend - 1))
 			break;
 
 		unlock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend,