[04/20] btrfs: always read the entire extent_buffer

Message ID	20230309090526.332550-5-hch@lst.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@vger.kernel.org> From: Christoph Hellwig <hch@lst.de> To: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>, David Sterba <dsterba@suse.com> Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 04/20] btrfs: always read the entire extent_buffer Date: Thu, 9 Mar 2023 10:05:10 +0100 Message-Id: <20230309090526.332550-5-hch@lst.de> In-Reply-To: <20230309090526.332550-1-hch@lst.de> References: <20230309090526.332550-1-hch@lst.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[01/20] btrfs: mark extent_buffer_under_io static \| expand [01/20] btrfs: mark extent_buffer_under_io static [02/20] btrfs: move setting the buffer uptodate out of validate_extent_buffer [03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate [04/20] btrfs: always read the entire extent_buffer [05/20] btrfs: simplify extent buffer reading [06/20] btrfs: remove the mirror_num argument to btrfs_submit_compressed_read [07/20] btrfs: simplify the read_extent_buffer end_io handler [08/20] btrfs: do not try to unlock the extent for non-subpage metadata reads [09/20] btrfs: return bool from lock_extent_buffer_for_io [10/20] btrfs: submit a writeback bio per extent_buffer [11/20] btrfs: move page locking from lock_extent_buffer_for_io to write_one_eb [12/20] btrfs: simplify extent buffer writing [13/20] btrfs: simplify the extent_buffer write end_io handler [14/20] btrfs: simplify btree block checksumming [15/20] btrfs: remove the io_pages field in struct extent_buffer [16/20] btrfs: stop using PageError for extent_buffers [17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages [18/20] btrfs: stop using lock_extent in btrfs_buffer_uptodate [19/20] btrfs: use per-buffer locking for extent_buffer reading [20/20] btrfs: merge write_one_subpage_eb into write_one_eb

Christoph Hellwig March 9, 2023, 9:05 a.m. UTC

Currently read_extent_buffer_pages skips pages that are already uptodate
when reading in an extent_buffer.  While this reduces the amount of data
read, it increases the number of I/O operations as we now need to do
multiple I/Os when reading an extent buffer with one or more uptodate
pages in the middle of it.  On any modern storage device, be that hard
drives or SSDs this actually decreases I/O performance.  Fortunately
this case is pretty rare as the pages are always initially read together
and then aged the same way.  Besides simplifying the code a bit as-is
this will allow for major simplifications to the I/O completion handler
later on.

Note that the case where all pages are uptodate is still handled by an
optimized fast path that does not read any data from disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

Johannes Thumshirn March 9, 2023, 11:29 a.m. UTC | #1

On 09.03.23 10:07, Christoph Hellwig wrote:
> Currently read_extent_buffer_pages skips pages that are already uptodate
> when reading in an extent_buffer.  While this reduces the amount of data
> read, it increases the number of I/O operations as we now need to do
> multiple I/Os when reading an extent buffer with one or more uptodate
> pages in the middle of it.  On any modern storage device, be that hard
> drives or SSDs this actually decreases I/O performance.  Fortunately
> this case is pretty rare as the pages are always initially read together
> and then aged the same way.  Besides simplifying the code a bit as-is
> this will allow for major simplifications to the I/O completion handler
> later on.
> 
> Note that the case where all pages are uptodate is still handled by an
> optimized fast path that does not read any data from disk.

Can someone smarter than me explain why we need to iterate eb->pages four
times in this function? This doesn't look super efficient to me.

> @@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>  	 */
>  	for (i = 0; i < num_pages; i++) {
>  		page = eb->pages[i];
> -		if (!PageUptodate(page)) {
> -			num_reads++;
> +		if (!PageUptodate(page))
>  			all_uptodate = 0;
> -		}

I think we could also break out of the loop here. As soon as
one page isn't uptodate we don't care about the fast path anymore.

Christoph Hellwig March 9, 2023, 3:21 p.m. UTC | #2

On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote:
> Can someone smarter than me explain why we need to iterate eb->pages four
> times in this function? This doesn't look super efficient to me.

We don't, and only a single iteration is left by the end of the series :)

Qu Wenruo March 10, 2023, 7:35 a.m. UTC | #3

On 2023/3/9 17:05, Christoph Hellwig wrote:
> Currently read_extent_buffer_pages skips pages that are already uptodate
> when reading in an extent_buffer.  While this reduces the amount of data
> read, it increases the number of I/O operations as we now need to do
> multiple I/Os when reading an extent buffer with one or more uptodate
> pages in the middle of it.  On any modern storage device, be that hard
> drives or SSDs this actually decreases I/O performance.  Fortunately
> this case is pretty rare as the pages are always initially read together
> and then aged the same way.
>  Besides simplifying the code a bit as-is
> this will allow for major simplifications to the I/O completion handler
> later on.
> 
> Note that the case where all pages are uptodate is still handled by an
> optimized fast path that does not read any data from disk.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/extent_io.c | 17 +++++------------
>   1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 302af9b01bda2a..26d8162bee000d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4313,7 +4313,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	int locked_pages = 0;
>   	int all_uptodate = 1;
>   	int num_pages;
> -	unsigned long num_reads = 0;
>   	struct btrfs_bio_ctrl bio_ctrl = {
>   		.opf = REQ_OP_READ,
>   		.mirror_num = mirror_num,
> @@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	 */
>   	for (i = 0; i < num_pages; i++) {
>   		page = eb->pages[i];
> -		if (!PageUptodate(page)) {
> -			num_reads++;
> +		if (!PageUptodate(page))
>   			all_uptodate = 0;
> -		}
>   	}
>   
>   	if (all_uptodate) {
> @@ -4372,7 +4369,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   
>   	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
>   	eb->read_mirror = 0;
> -	atomic_set(&eb->io_pages, num_reads);
> +	atomic_set(&eb->io_pages, num_pages);
>   	/*
>   	 * It is possible for release_folio to clear the TREE_REF bit before we
>   	 * set io_pages. See check_buffer_tree_ref for a more detailed comment.
> @@ -4382,13 +4379,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	for (i = 0; i < num_pages; i++) {
>   		page = eb->pages[i];
>   
> -		if (!PageUptodate(page)) {
> -			ClearPageError(page);
> -			submit_extent_page(&bio_ctrl, page_offset(page), page,
> -					   PAGE_SIZE, 0);
> -		} else {
> -			unlock_page(page);
> -		}
> +		ClearPageError(page);
> +		submit_extent_page(&bio_ctrl, page_offset(page), page,
> +				   PAGE_SIZE, 0);
>   	}
>   
>   	submit_one_bio(&bio_ctrl);

Christoph Hellwig March 14, 2023, 6:09 a.m. UTC | #4

On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote:
> > -		if (!PageUptodate(page)) {
> > -			num_reads++;
> > +		if (!PageUptodate(page))
> >  			all_uptodate = 0;
> > -		}
> 
> I think we could also break out of the loop here. As soon as
> one page isn't uptodate we don't care about the fast path anymore.

FYI, I've decided to keep this as-is for the next version as the
code is gone by the end of the series anyway.

David Sterba March 17, 2023, 11:16 p.m. UTC | #5

On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote:
> On 09.03.23 10:07, Christoph Hellwig wrote:
> > Currently read_extent_buffer_pages skips pages that are already uptodate
> > when reading in an extent_buffer.  While this reduces the amount of data
> > read, it increases the number of I/O operations as we now need to do
> > multiple I/Os when reading an extent buffer with one or more uptodate
> > pages in the middle of it.  On any modern storage device, be that hard
> > drives or SSDs this actually decreases I/O performance.  Fortunately
> > this case is pretty rare as the pages are always initially read together
> > and then aged the same way.  Besides simplifying the code a bit as-is
> > this will allow for major simplifications to the I/O completion handler
> > later on.
> > 
> > Note that the case where all pages are uptodate is still handled by an
> > optimized fast path that does not read any data from disk.
> 
> Can someone smarter than me explain why we need to iterate eb->pages four
> times in this function? This doesn't look super efficient to me.

I remember one bug that was solved by splitting the first loop into two,
one locking all pages first and then checking the uptodate status, the
comment is explaining that.

2571e739677f ("Btrfs: fix memory leak in reading btree blocks")

As you can see in the changelog it's a bit convoluted race, the number
of loops should not matter if they're there for correctness reasons. I
haven't checked the final code but I doubt it's equivalent and may
introduce subtle bugs.

Christoph Hellwig March 20, 2023, 5:46 a.m. UTC | #6

On Sat, Mar 18, 2023 at 12:16:09AM +0100, David Sterba wrote:
> I remember one bug that was solved by splitting the first loop into two,
> one locking all pages first and then checking the uptodate status, the
> comment is explaining that.
> 
> 2571e739677f ("Btrfs: fix memory leak in reading btree blocks")
> 
> As you can see in the changelog it's a bit convoluted race, the number
> of loops should not matter if they're there for correctness reasons. I
> haven't checked the final code but I doubt it's equivalent and may
> introduce subtle bugs.

The patch we're replying to would have actually fixed that above
bug on it's own, and thus have also fixed the above issue.  The
problem it solves was that num_pages could be smaller than the
actual number of pages in the extent_buffer.  With this change,
we always read all pages and thus can't have a wrong ->io_pages.
In fact a later patch removes ->io_pages entirely.  Even better
at the end of the series the locking between different reads moves
to the extent_buffer level, so the above race is fundamentally
fixed at a higher level and there won't be an extra read at all.

[04/20] btrfs: always read the entire extent_buffer

Commit Message

Comments

Patch