btrfs: do not wait for short bulk allocation

Message ID	78e109cdbec7b11b1832822143d483509abb059e.1712266967.git.wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F4F082D90 for <linux-btrfs@vger.kernel.org>; Thu, 4 Apr 2024 21:43:37 +0000 (UTC) From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Cc: Julian Taylor <julian.taylor@1und1.de>, Filipe Manana <fdmanana@suse.com> Subject: [PATCH] btrfs: do not wait for short bulk allocation Date: Fri, 5 Apr 2024 08:13:11 +1030 Message-ID: <78e109cdbec7b11b1832822143d483509abb059e.1712266967.git.wqu@suse.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	btrfs: do not wait for short bulk allocation \| expand btrfs: do not wait for short bulk allocation

Message ID

78e109cdbec7b11b1832822143d483509abb059e.1712266967.git.wqu@suse.com (mailing list archive)

State

New, archived

Headers

From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: Julian Taylor <julian.taylor@1und1.de>,
	Filipe Manana <fdmanana@suse.com>
Subject: [PATCH] btrfs: do not wait for short bulk allocation
Date: Fri,  5 Apr 2024 08:13:11 +1030
Message-ID: 
 <78e109cdbec7b11b1832822143d483509abb059e.1712266967.git.wqu@suse.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

btrfs: do not wait for short bulk allocation | expand

Commit Message

Qu Wenruo April 4, 2024, 9:43 p.m. UTC

[BUG]
There is a recent report that when memory pressure is high (including
cached pages), btrfs can spend most of its time on memory allocation in
btrfs_alloc_page_array() for compressed read/write.

[CAUSE]
For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
even if the bulk allocation failed (fell back to single page
allocation) we still retry but with extra memalloc_retry_wait().

If the bulk alloc only returned one page a time, we would spend a lot of
time on the retry wait.

The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
incomplete batch memory allocations").

[FIX]
Although the commit mentioned that other filesystems do the wait, it's
not the case at least nowadays.

All the mainlined filesystems only call memalloc_retry_wait() if they
failed to allocate any page (not only for bulk allocation).
If there is any progress, they won't call memalloc_retry_wait() at all.

For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
if there is no allocation progress at all, and the call is not for
metadata readahead.

So I don't believe we should call memalloc_retry_wait() unconditionally
for short allocation.

This patch would only call memalloc_retry_wait() if failed to allocate
any page for tree block allocation (which goes with __GFP_NOFAIL and may
not need the special handling anyway), and reduce the latency for
btrfs_alloc_page_array().

Reported-by: Julian Taylor <julian.taylor@1und1.de>
Tested-by: Julian Taylor <julian.taylor@1und1.de>
Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v3:
- Remove wait part completely
  For NOFAIL metadata allocation, the allocation itself should not fail.
  For regular allocation, we can afford the failure anyway.

v2:
- Still use bulk allocation function
  Since alloc_pages_bulk_array() would fall back to single page
  allocation by itself, there is no need to go alloc_page() manually.

- Update the commit message to indicate other fses do not call
  memalloc_retry_wait() unconditionally
  In fact, they only call it when they need to retry hard and can not
  really fail.
---
 fs/btrfs/extent_io.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

Comments

Sweet Tea Dorminy April 5, 2024, 1:10 a.m. UTC | #1

On 4/4/24 17:43, Qu Wenruo wrote:
> [BUG]
> There is a recent report that when memory pressure is high (including
> cached pages), btrfs can spend most of its time on memory allocation in
> btrfs_alloc_page_array() for compressed read/write.
> 
> [CAUSE]
> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
> even if the bulk allocation failed (fell back to single page
> allocation) we still retry but with extra memalloc_retry_wait().
> 
> If the bulk alloc only returned one page a time, we would spend a lot of
> time on the retry wait.
> 
> The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
> incomplete batch memory allocations").
> 
> [FIX]
> Although the commit mentioned that other filesystems do the wait, it's
> not the case at least nowadays.
> 
> All the mainlined filesystems only call memalloc_retry_wait() if they
> failed to allocate any page (not only for bulk allocation).
> If there is any progress, they won't call memalloc_retry_wait() at all.
> 
> For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
> if there is no allocation progress at all, and the call is not for
> metadata readahead.
> 
> So I don't believe we should call memalloc_retry_wait() unconditionally
> for short allocation.
> 
> This patch would only call memalloc_retry_wait() if failed to allocate
> any page for tree block allocation (which goes with __GFP_NOFAIL and may
> not need the special handling anyway), and reduce the latency for
> btrfs_alloc_page_array().
> 
> Reported-by: Julian Taylor <julian.taylor@1und1.de>
> Tested-by: Julian Taylor <julian.taylor@1und1.de>
> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
> Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v3:
> - Remove wait part completely
>    For NOFAIL metadata allocation, the allocation itself should not fail.
>    For regular allocation, we can afford the failure anyway.
> 
> v2:
> - Still use bulk allocation function
>    Since alloc_pages_bulk_array() would fall back to single page
>    allocation by itself, there is no need to go alloc_page() manually.
> 
> - Update the commit message to indicate other fses do not call
>    memalloc_retry_wait() unconditionally
>    In fact, they only call it when they need to retry hard and can not
>    really fail.
> ---
>   fs/btrfs/extent_io.c | 18 ++++--------------
>   1 file changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index bbdcb7475cea..48476f8fcf79 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -712,31 +712,21 @@ int btrfs_alloc_folio_array(unsigned int nr_folios, struct folio **folio_array,
>   int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
>   			   gfp_t extra_gfp)
>   {
> +	const gfp_t gfp = GFP_NOFS | extra_gfp;
>   	unsigned int allocated;
>   
>   	for (allocated = 0; allocated < nr_pages;) {
>   		unsigned int last = allocated;
>   
> -		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
> -						   nr_pages, page_array);
> -
> -		if (allocated == nr_pages)
> -			return 0;
> -
> -		/*
> -		 * During this iteration, no page could be allocated, even
> -		 * though alloc_pages_bulk_array() falls back to alloc_page()
> -		 * if  it could not bulk-allocate. So we must be out of memory.
> -		 */
> -		if (allocated == last) {
> +		allocated = alloc_pages_bulk_array(gfp, nr_pages, page_array);
> +		if (unlikely(allocated == last)) {
> +			/* Fail and do cleanup. */
>   			for (int i = 0; i < allocated; i++) {
>   				__free_page(page_array[i]);
>   				page_array[i] = NULL;
>   			}
>   			return -ENOMEM;
>   		}
> -
> -		memalloc_retry_wait(GFP_NOFS);
>   	}
>   	return 0;
>   }

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bbdcb7475cea..48476f8fcf79 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -712,31 +712,21 @@  int btrfs_alloc_folio_array(unsigned int nr_folios, struct folio **folio_array,
 int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array,
 			   gfp_t extra_gfp)
 {
+	const gfp_t gfp = GFP_NOFS | extra_gfp;
 	unsigned int allocated;
 
 	for (allocated = 0; allocated < nr_pages;) {
 		unsigned int last = allocated;
 
-		allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp,
-						   nr_pages, page_array);
-
-		if (allocated == nr_pages)
-			return 0;
-
-		/*
-		 * During this iteration, no page could be allocated, even
-		 * though alloc_pages_bulk_array() falls back to alloc_page()
-		 * if  it could not bulk-allocate. So we must be out of memory.
-		 */
-		if (allocated == last) {
+		allocated = alloc_pages_bulk_array(gfp, nr_pages, page_array);
+		if (unlikely(allocated == last)) {
+			/* Fail and do cleanup. */
 			for (int i = 0; i < allocated; i++) {
 				__free_page(page_array[i]);
 				page_array[i] = NULL;
 			}
 			return -ENOMEM;
 		}
-
-		memalloc_retry_wait(GFP_NOFS);
 	}
 	return 0;
 }

btrfs: do not wait for short bulk allocation

Commit Message

Comments

Patch