Message ID | 9ec01e023cc34e5729dd4a86affd5158f87c7a83.1711311627.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fallback to single page allocation to avoid bulk allocation latency | expand |
On Sun, Mar 24, 2024 at 8:21 PM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > There is a recent report that when memory pressure is high (including > cached pages), btrfs can spend most of its time on memory allocation in > btrfs_alloc_page_array(). > > [CAUSE] > For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and > even if the bulk allocation failed we still retry but with extra > memalloc_retry_wait(). > > If the bulk alloc only returned one page a time, we would spend a lot of > time on the retry wait. > > [FIX] > Instead of always trying the same bulk allocation, fallback to single > page allocation if the initial bulk allocation attempt doesn't fill the > whole request. > > Even if this means a higher chance of memory allocation failure. Well, this is a bit confusing, this last sentence. I mean the chances are the same we had before commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator"). > > Reported-by: Julian Taylor <julian.taylor@1und1.de> > Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/ As Julian seems to have tested this and confirmed it solves the regression for him, a Tested-by tag should be here. Also given that this seems like a huge performance drop, a Fixes tag would be appropriate here. Just one comment about the code below. > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/extent_io.c | 35 ++++++++++++----------------------- > 1 file changed, 12 insertions(+), 23 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 7441245b1ceb..d49e7f0384ed 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -681,33 +681,22 @@ static void end_bbio_data_read(struct btrfs_bio *bbio) > int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array, > gfp_t extra_gfp) > { > + const gfp_t gfp = GFP_NOFS | extra_gfp; > unsigned int allocated; > > - for (allocated = 0; allocated < nr_pages;) { > - unsigned int last = allocated; > - > - allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp, > - nr_pages, page_array); > - > - if (allocated == nr_pages) > - return 0; > - > - /* > - * During this iteration, no page could be allocated, even > - * though alloc_pages_bulk_array() falls back to alloc_page() > - * if it could not bulk-allocate. So we must be out of memory. > - */ > - if (allocated == last) { > - for (int i = 0; i < allocated; i++) { > - __free_page(page_array[i]); > - page_array[i] = NULL; > - } > - return -ENOMEM; > - } > - > - memalloc_retry_wait(GFP_NOFS); > + allocated = alloc_pages_bulk_array(GFP_NOFS | gfp, nr_pages, page_array); No need to add GFP_NOFS, the gpg variable already has it. Otherwise it looks good, so with that adjusted: Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > + for (; allocated < nr_pages; allocated++) { > + page_array[allocated] = alloc_page(gfp); > + if (unlikely(!page_array[allocated])) > + goto enomem; > } > return 0; > +enomem: > + for (int i = 0; i < allocated; i++) { > + __free_page(page_array[i]); > + page_array[i] = NULL; > + } > + return -ENOMEM; > } > > /* > -- > 2.44.0 > >
On 3/24/24 16:20, Qu Wenruo wrote: > [BUG] > There is a recent report that when memory pressure is high (including > cached pages), btrfs can spend most of its time on memory allocation in > btrfs_alloc_page_array(). > > [CAUSE] > For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and > even if the bulk allocation failed we still retry but with extra > memalloc_retry_wait(). > > If the bulk alloc only returned one page a time, we would spend a lot of > time on the retry wait. > > [FIX] > Instead of always trying the same bulk allocation, fallback to single > page allocation if the initial bulk allocation attempt doesn't fill the > whole request. I still think the argument in 395cb57e85604 is reasonably compelling: that it's polite and normative among filesystems to throw in a retry_wait() between attempts to allocate, and I'm not sure why we're seeing this performance impact where others seemingly don't. But your change does a lot more than just cutting out the retry_wait(); it only tries the bulk allocator once. Given the bulk allocator falls back to the single-page allocator itself and can allocate multiple pages on any given iteration, I think it's better to just cut out the retry_wait() if we must, in hopes we can allocate more than one when retrying? Or, could we add in GFP_RETRY_MAYFAIL into the btrfs_alloc_page_array() call in compression.c instead, so that page reclaim is started if the initial alloc_pages_bulk_array() doesn't work out?
在 2024/3/26 07:48, Sweet Tea Dorminy 写道: > > > On 3/24/24 16:20, Qu Wenruo wrote: >> [BUG] >> There is a recent report that when memory pressure is high (including >> cached pages), btrfs can spend most of its time on memory allocation in >> btrfs_alloc_page_array(). >> >> [CAUSE] >> For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and >> even if the bulk allocation failed we still retry but with extra >> memalloc_retry_wait(). >> >> If the bulk alloc only returned one page a time, we would spend a lot of >> time on the retry wait. >> >> [FIX] >> Instead of always trying the same bulk allocation, fallback to single >> page allocation if the initial bulk allocation attempt doesn't fill the >> whole request. > > > I still think the argument in 395cb57e85604 is reasonably compelling: > that it's polite and normative among filesystems to throw in a > retry_wait() between attempts to allocate, and I'm not sure why we're > seeing this performance impact where others seemingly don't. The reporter is hitting this during compression, meanwhile no other major fs (until recent bcachefs?) supports compression. Thus I guess we have a corner case where other fses don't? > > But your change does a lot more than just cutting out the retry_wait(); > it only tries the bulk allocator once. Given the bulk allocator falls > back to the single-page allocator itself and can allocate multiple pages > on any given iteration, I think it's better to just cut out the > retry_wait() if we must, in hopes we can allocate more than one when > retrying? > > Or, could we add in GFP_RETRY_MAYFAIL into the btrfs_alloc_page_array() > call in compression.c instead, so that page reclaim is started if the > initial alloc_pages_bulk_array() doesn't work out? My question is, I don't have a good reference on why we should retry wait even for successful allocation. In f2fs/ext4/XFS, the memalloc_retry_wait() is only called for page allocation failure (aka, no allocation is done at all), not after each bulk allocation unconditionally. Thus I'm wondering if it's really a policy/common practice to do the wait for successful allocation at all. In that case, I'm totally fine to only call retry wait if we are unable to allocate any page, other than unconditionally. Thanks, Qu
> > The reporter is hitting this during compression, meanwhile no other > major fs (until recent bcachefs?) supports compression. > > Thus I guess we have a corner case where other fses don't? Makes sense. And since read success depends on it, it's latency sensitive, so I think that could be a justification if we did eventually want to throw GFP_RETRY_MAYFAIL in the flags. > > My question is, I don't have a good reference on why we should retry > wait even for successful allocation. > > In f2fs/ext4/XFS, the memalloc_retry_wait() is only called for page > allocation failure (aka, no allocation is done at all), not after each > bulk allocation unconditionally. > > Thus I'm wondering if it's really a policy/common practice to do the > wait for successful allocation at all. You make a good point. https://elixir.bootlin.com/linux/latest/source/include/linux/sched/mm.h#L267 seems to indicate one should use it on every retry of mem allocation, but in practice it does appear that it's only done for a complete fialure to allocate. > > > In that case, I'm totally fine to only call retry wait if we are unable > to allocate any page, other than unconditionally. > Okay, I agree that that's a good plan and appreciate your discussion. I don't know if it's worth giving-up-and-freeing if we fail to make progress twice, but I don't think it matters. Thank you very much! Sweet Tea
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 7441245b1ceb..d49e7f0384ed 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -681,33 +681,22 @@ static void end_bbio_data_read(struct btrfs_bio *bbio) int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array, gfp_t extra_gfp) { + const gfp_t gfp = GFP_NOFS | extra_gfp; unsigned int allocated; - for (allocated = 0; allocated < nr_pages;) { - unsigned int last = allocated; - - allocated = alloc_pages_bulk_array(GFP_NOFS | extra_gfp, - nr_pages, page_array); - - if (allocated == nr_pages) - return 0; - - /* - * During this iteration, no page could be allocated, even - * though alloc_pages_bulk_array() falls back to alloc_page() - * if it could not bulk-allocate. So we must be out of memory. - */ - if (allocated == last) { - for (int i = 0; i < allocated; i++) { - __free_page(page_array[i]); - page_array[i] = NULL; - } - return -ENOMEM; - } - - memalloc_retry_wait(GFP_NOFS); + allocated = alloc_pages_bulk_array(GFP_NOFS | gfp, nr_pages, page_array); + for (; allocated < nr_pages; allocated++) { + page_array[allocated] = alloc_page(gfp); + if (unlikely(!page_array[allocated])) + goto enomem; } return 0; +enomem: + for (int i = 0; i < allocated; i++) { + __free_page(page_array[i]); + page_array[i] = NULL; + } + return -ENOMEM; } /*
[BUG] There is a recent report that when memory pressure is high (including cached pages), btrfs can spend most of its time on memory allocation in btrfs_alloc_page_array(). [CAUSE] For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and even if the bulk allocation failed we still retry but with extra memalloc_retry_wait(). If the bulk alloc only returned one page a time, we would spend a lot of time on the retry wait. [FIX] Instead of always trying the same bulk allocation, fallback to single page allocation if the initial bulk allocation attempt doesn't fill the whole request. Even if this means a higher chance of memory allocation failure. Reported-by: Julian Taylor <julian.taylor@1und1.de> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/ Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 35 ++++++++++++----------------------- 1 file changed, 12 insertions(+), 23 deletions(-)