Message ID | 20230309090526.332550-5-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/20] btrfs: mark extent_buffer_under_io static | expand |
On 09.03.23 10:07, Christoph Hellwig wrote: > Currently read_extent_buffer_pages skips pages that are already uptodate > when reading in an extent_buffer. While this reduces the amount of data > read, it increases the number of I/O operations as we now need to do > multiple I/Os when reading an extent buffer with one or more uptodate > pages in the middle of it. On any modern storage device, be that hard > drives or SSDs this actually decreases I/O performance. Fortunately > this case is pretty rare as the pages are always initially read together > and then aged the same way. Besides simplifying the code a bit as-is > this will allow for major simplifications to the I/O completion handler > later on. > > Note that the case where all pages are uptodate is still handled by an > optimized fast path that does not read any data from disk. Can someone smarter than me explain why we need to iterate eb->pages four times in this function? This doesn't look super efficient to me. > @@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num, > */ > for (i = 0; i < num_pages; i++) { > page = eb->pages[i]; > - if (!PageUptodate(page)) { > - num_reads++; > + if (!PageUptodate(page)) > all_uptodate = 0; > - } I think we could also break out of the loop here. As soon as one page isn't uptodate we don't care about the fast path anymore.
On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote: > Can someone smarter than me explain why we need to iterate eb->pages four > times in this function? This doesn't look super efficient to me. We don't, and only a single iteration is left by the end of the series :)
On 2023/3/9 17:05, Christoph Hellwig wrote: > Currently read_extent_buffer_pages skips pages that are already uptodate > when reading in an extent_buffer. While this reduces the amount of data > read, it increases the number of I/O operations as we now need to do > multiple I/Os when reading an extent buffer with one or more uptodate > pages in the middle of it. On any modern storage device, be that hard > drives or SSDs this actually decreases I/O performance. Fortunately > this case is pretty rare as the pages are always initially read together > and then aged the same way. > Besides simplifying the code a bit as-is > this will allow for major simplifications to the I/O completion handler > later on. > > Note that the case where all pages are uptodate is still handled by an > optimized fast path that does not read any data from disk. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > --- > fs/btrfs/extent_io.c | 17 +++++------------ > 1 file changed, 5 insertions(+), 12 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 302af9b01bda2a..26d8162bee000d 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -4313,7 +4313,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num, > int locked_pages = 0; > int all_uptodate = 1; > int num_pages; > - unsigned long num_reads = 0; > struct btrfs_bio_ctrl bio_ctrl = { > .opf = REQ_OP_READ, > .mirror_num = mirror_num, > @@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num, > */ > for (i = 0; i < num_pages; i++) { > page = eb->pages[i]; > - if (!PageUptodate(page)) { > - num_reads++; > + if (!PageUptodate(page)) > all_uptodate = 0; > - } > } > > if (all_uptodate) { > @@ -4372,7 +4369,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num, > > clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags); > eb->read_mirror = 0; > - atomic_set(&eb->io_pages, num_reads); > + atomic_set(&eb->io_pages, num_pages); > /* > * It is possible for release_folio to clear the TREE_REF bit before we > * set io_pages. See check_buffer_tree_ref for a more detailed comment. > @@ -4382,13 +4379,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num, > for (i = 0; i < num_pages; i++) { > page = eb->pages[i]; > > - if (!PageUptodate(page)) { > - ClearPageError(page); > - submit_extent_page(&bio_ctrl, page_offset(page), page, > - PAGE_SIZE, 0); > - } else { > - unlock_page(page); > - } > + ClearPageError(page); > + submit_extent_page(&bio_ctrl, page_offset(page), page, > + PAGE_SIZE, 0); > } > > submit_one_bio(&bio_ctrl);
On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote: > > - if (!PageUptodate(page)) { > > - num_reads++; > > + if (!PageUptodate(page)) > > all_uptodate = 0; > > - } > > I think we could also break out of the loop here. As soon as > one page isn't uptodate we don't care about the fast path anymore. FYI, I've decided to keep this as-is for the next version as the code is gone by the end of the series anyway.
On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote: > On 09.03.23 10:07, Christoph Hellwig wrote: > > Currently read_extent_buffer_pages skips pages that are already uptodate > > when reading in an extent_buffer. While this reduces the amount of data > > read, it increases the number of I/O operations as we now need to do > > multiple I/Os when reading an extent buffer with one or more uptodate > > pages in the middle of it. On any modern storage device, be that hard > > drives or SSDs this actually decreases I/O performance. Fortunately > > this case is pretty rare as the pages are always initially read together > > and then aged the same way. Besides simplifying the code a bit as-is > > this will allow for major simplifications to the I/O completion handler > > later on. > > > > Note that the case where all pages are uptodate is still handled by an > > optimized fast path that does not read any data from disk. > > Can someone smarter than me explain why we need to iterate eb->pages four > times in this function? This doesn't look super efficient to me. I remember one bug that was solved by splitting the first loop into two, one locking all pages first and then checking the uptodate status, the comment is explaining that. 2571e739677f ("Btrfs: fix memory leak in reading btree blocks") As you can see in the changelog it's a bit convoluted race, the number of loops should not matter if they're there for correctness reasons. I haven't checked the final code but I doubt it's equivalent and may introduce subtle bugs.
On Sat, Mar 18, 2023 at 12:16:09AM +0100, David Sterba wrote: > I remember one bug that was solved by splitting the first loop into two, > one locking all pages first and then checking the uptodate status, the > comment is explaining that. > > 2571e739677f ("Btrfs: fix memory leak in reading btree blocks") > > As you can see in the changelog it's a bit convoluted race, the number > of loops should not matter if they're there for correctness reasons. I > haven't checked the final code but I doubt it's equivalent and may > introduce subtle bugs. The patch we're replying to would have actually fixed that above bug on it's own, and thus have also fixed the above issue. The problem it solves was that num_pages could be smaller than the actual number of pages in the extent_buffer. With this change, we always read all pages and thus can't have a wrong ->io_pages. In fact a later patch removes ->io_pages entirely. Even better at the end of the series the locking between different reads moves to the extent_buffer level, so the above race is fundamentally fixed at a higher level and there won't be an extra read at all.
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 302af9b01bda2a..26d8162bee000d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4313,7 +4313,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num, int locked_pages = 0; int all_uptodate = 1; int num_pages; - unsigned long num_reads = 0; struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ, .mirror_num = mirror_num, @@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num, */ for (i = 0; i < num_pages; i++) { page = eb->pages[i]; - if (!PageUptodate(page)) { - num_reads++; + if (!PageUptodate(page)) all_uptodate = 0; - } } if (all_uptodate) { @@ -4372,7 +4369,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num, clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags); eb->read_mirror = 0; - atomic_set(&eb->io_pages, num_reads); + atomic_set(&eb->io_pages, num_pages); /* * It is possible for release_folio to clear the TREE_REF bit before we * set io_pages. See check_buffer_tree_ref for a more detailed comment. @@ -4382,13 +4379,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num, for (i = 0; i < num_pages; i++) { page = eb->pages[i]; - if (!PageUptodate(page)) { - ClearPageError(page); - submit_extent_page(&bio_ctrl, page_offset(page), page, - PAGE_SIZE, 0); - } else { - unlock_page(page); - } + ClearPageError(page); + submit_extent_page(&bio_ctrl, page_offset(page), page, + PAGE_SIZE, 0); } submit_one_bio(&bio_ctrl);
Currently read_extent_buffer_pages skips pages that are already uptodate when reading in an extent_buffer. While this reduces the amount of data read, it increases the number of I/O operations as we now need to do multiple I/Os when reading an extent buffer with one or more uptodate pages in the middle of it. On any modern storage device, be that hard drives or SSDs this actually decreases I/O performance. Fortunately this case is pretty rare as the pages are always initially read together and then aged the same way. Besides simplifying the code a bit as-is this will allow for major simplifications to the I/O completion handler later on. Note that the case where all pages are uptodate is still handled by an optimized fast path that does not read any data from disk. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/btrfs/extent_io.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-)