diff mbox series

[08/16] btrfs: stop setting PageError in the data I/O path

Message ID 20230523081322.331337-9-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/16] btrfs: fix range_end calculation in extent_write_locked_range | expand

Commit Message

Christoph Hellwig May 23, 2023, 8:13 a.m. UTC
PageError is not used by the VFS/MM and deprecated.  Btrfs now only sets
the flag and never clears it for data pages, so just remove all places
setting it, and the subpage error bit.

Note that the error propagation for superblock writes still uses
PageError for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 24 +++++-------------------
 fs/btrfs/inode.c     |  3 ---
 fs/btrfs/subpage.c   | 34 ----------------------------------
 fs/btrfs/subpage.h   | 10 ++++------
 4 files changed, 9 insertions(+), 62 deletions(-)

Comments

David Sterba May 29, 2023, 5:52 p.m. UTC | #1
On Tue, May 23, 2023 at 10:13:14AM +0200, Christoph Hellwig wrote:
> PageError is not used by the VFS/MM and deprecated.

Is there some reference other than code? I remember LSF/MM talks about
writeback, reducing page flags but I can't find anything that would say
that PageError is not used anymore. For such core changes in the
neighboring subysystems I kind of rely on lwn.net, hearsay or accidental
notice on fsdevel.

Removing the Error bit handling looks overall good but I'd also need to
refresh my understanding of the interactions with writeback.
Christoph Hellwig May 30, 2023, 5:45 a.m. UTC | #2
On Mon, May 29, 2023 at 07:52:10PM +0200, David Sterba wrote:
> On Tue, May 23, 2023 at 10:13:14AM +0200, Christoph Hellwig wrote:
> > PageError is not used by the VFS/MM and deprecated.
> 
> Is there some reference other than code? I remember LSF/MM talks about
> writeback, reducing page flags but I can't find anything that would say
> that PageError is not used anymore. For such core changes in the
> neighboring subysystems I kind of rely on lwn.net, hearsay or accidental
> notice on fsdevel.
> 
> Removing the Error bit handling looks overall good but I'd also need to
> refresh my understanding of the interactions with writeback.

willy is the driving force behind the PageErro removal, so maybe he
has a coherent writeup.  But the basic idea is:

 - page flag space availability is limited, and killing any one we can
   easily is a good thing
 - PageError is not well defined and not part of any VM or VFS contract,
   so in addition to freeing up space it also generally tends to remove
   confusion.
Matthew Wilcox May 30, 2023, 6:08 a.m. UTC | #3
On Tue, May 30, 2023 at 07:45:47AM +0200, Christoph Hellwig wrote:
> On Mon, May 29, 2023 at 07:52:10PM +0200, David Sterba wrote:
> > On Tue, May 23, 2023 at 10:13:14AM +0200, Christoph Hellwig wrote:
> > > PageError is not used by the VFS/MM and deprecated.
> > 
> > Is there some reference other than code? I remember LSF/MM talks about
> > writeback, reducing page flags but I can't find anything that would say
> > that PageError is not used anymore. For such core changes in the
> > neighboring subysystems I kind of rely on lwn.net, hearsay or accidental
> > notice on fsdevel.
> > 
> > Removing the Error bit handling looks overall good but I'd also need to
> > refresh my understanding of the interactions with writeback.
> 
> willy is the driving force behind the PageErro removal, so maybe he
> has a coherent writeup.  But the basic idea is:
> 
>  - page flag space availability is limited, and killing any one we can
>    easily is a good thing
>  - PageError is not well defined and not part of any VM or VFS contract,
>    so in addition to freeing up space it also generally tends to remove
>    confusion.

I don't think I have a coherent writeup.  Basically:

 - The VFS and VM do not check the error flag

   $ git grep folio_test_error
   fs/gfs2/lops.c: if (folio_test_error(folio))
   mm/migrate.c:   if (folio_test_error(folio))

   (the use in mm/migrate.c replicates the error flag on migration)

   A similar grep -w PageError finds only tests in filesystems and
   comments.

 - We do not document clearly under what circumstances the error flag
   is set or cleared

If we can remove all uses of testing the error flag, then we can remove
everywhere that sets or clears the flag.  This is usually a simple
matter of checking folio_test_uptodate() instead of !PageError(),
since the folio will not be marked uptodate if there is a read error.
Write errors are not normally tracked on a per-folio basis.  Instead they
are tracked through mapping_set_error().

I did a number of these conversions about a year ago; I'm happy Christoph
is making progress with btrfs here.  See 'git log 6e8e79fc8443' for many
of them.
David Sterba May 30, 2023, 1:23 p.m. UTC | #4
On Tue, May 23, 2023 at 10:13:14AM +0200, Christoph Hellwig wrote:
> Note that the error propagation for superblock writes still uses
> PageError for now.

This patchset cleans up all the other Error bits so for super block we
can first switch to another flag like PageChecked, reworking the
separate page for superblock I tired some time ago still had problems.
David Sterba May 30, 2023, 1:34 p.m. UTC | #5
On Tue, May 30, 2023 at 07:08:38AM +0100, Matthew Wilcox wrote:
> On Tue, May 30, 2023 at 07:45:47AM +0200, Christoph Hellwig wrote:
> > On Mon, May 29, 2023 at 07:52:10PM +0200, David Sterba wrote:
> > > On Tue, May 23, 2023 at 10:13:14AM +0200, Christoph Hellwig wrote:
> > > > PageError is not used by the VFS/MM and deprecated.
> > > 
> > > Is there some reference other than code? I remember LSF/MM talks about
> > > writeback, reducing page flags but I can't find anything that would say
> > > that PageError is not used anymore. For such core changes in the
> > > neighboring subysystems I kind of rely on lwn.net, hearsay or accidental
> > > notice on fsdevel.
> > > 
> > > Removing the Error bit handling looks overall good but I'd also need to
> > > refresh my understanding of the interactions with writeback.
> > 
> > willy is the driving force behind the PageErro removal, so maybe he
> > has a coherent writeup.  But the basic idea is:
> > 
> >  - page flag space availability is limited, and killing any one we can
> >    easily is a good thing
> >  - PageError is not well defined and not part of any VM or VFS contract,
> >    so in addition to freeing up space it also generally tends to remove
> >    confusion.
> 
> I don't think I have a coherent writeup.  Basically:
> 
>  - The VFS and VM do not check the error flag
> 
>    $ git grep folio_test_error
>    fs/gfs2/lops.c: if (folio_test_error(folio))
>    mm/migrate.c:   if (folio_test_error(folio))
> 
>    (the use in mm/migrate.c replicates the error flag on migration)
> 
>    A similar grep -w PageError finds only tests in filesystems and
>    comments.
> 
>  - We do not document clearly under what circumstances the error flag
>    is set or cleared
> 
> If we can remove all uses of testing the error flag, then we can remove
> everywhere that sets or clears the flag.  This is usually a simple
> matter of checking folio_test_uptodate() instead of !PageError(),
> since the folio will not be marked uptodate if there is a read error.
> Write errors are not normally tracked on a per-folio basis.  Instead they
> are tracked through mapping_set_error().
> 
> I did a number of these conversions about a year ago; I'm happy Christoph
> is making progress with btrfs here.  See 'git log 6e8e79fc8443' for many
> of them.

Thank you, that helped.

I found one case of folio_set_error() that seems to be redundant:

   189  static noinline void end_compressed_writeback(const struct compressed_bio *cb)
   190  {
   191          struct inode *inode = &cb->bbio.inode->vfs_inode;
   192          struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
   193          unsigned long index = cb->start >> PAGE_SHIFT;
   194          unsigned long end_index = (cb->start + cb->len - 1) >> PAGE_SHIFT;
   195          struct folio_batch fbatch;
   196          const int errno = blk_status_to_errno(cb->bbio.bio.bi_status);
   197          int i;
   198          int ret;
   199  
   200          if (errno)
   201                  mapping_set_error(inode->i_mapping, errno);
   202  
   203          folio_batch_init(&fbatch);
   204          while (index <= end_index) {
   205                  ret = filemap_get_folios(inode->i_mapping, &index, end_index,
   206                                  &fbatch);
   207  
   208                  if (ret == 0)
   209                          return;
   210  
   211                  for (i = 0; i < ret; i++) {
   212                          struct folio *folio = fbatch.folios[i];
   213  
   214                          if (errno)
   215                                  folio_set_error(folio);
                                        ^^^^^^^^^^^^^^^^^^^^^^^

The error is set on the mapping on line 201 under the same condition.

With that removed and the super block write error handling moved away
from the Error bit the btrfs code will be Error-clean.

   216                          btrfs_page_clamp_clear_writeback(fs_info, &folio->page,
   217                                                           cb->start, cb->len);
   218                  }
   219                  folio_batch_release(&fbatch);
   220          }
   221          /* the inode may be gone now */
   222  }
Christoph Hellwig May 30, 2023, 2:24 p.m. UTC | #6
On Tue, May 30, 2023 at 03:23:11PM +0200, David Sterba wrote:
> On Tue, May 23, 2023 at 10:13:14AM +0200, Christoph Hellwig wrote:
> > Note that the error propagation for superblock writes still uses
> > PageError for now.
> 
> This patchset cleans up all the other Error bits so for super block we
> can first switch to another flag like PageChecked, reworking the
> separate page for superblock I tired some time ago still had problems.

Yes.  But the superblock writing all works based on the block devices
page cache, so it doesn't interact with the per-normal file pagecache
touched here, or the magic btree inode page cache touched in the
previous series.  I was planning to take a closer look at the
superblock handling when I find some time.
Christoph Hellwig May 30, 2023, 2:26 p.m. UTC | #7
On Tue, May 30, 2023 at 03:34:46PM +0200, David Sterba wrote:
> I found one case of folio_set_error() that seems to be redundant:

Ooops, I missed that as it's one of the few folio based things in
btrfs.  I'll fix it up together with the other little things in a
resend.
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d7b31888efa17a..28610ed0fae913 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -223,8 +223,6 @@  static int process_one_page(struct btrfs_fs_info *fs_info,
 
 	if (page_ops & PAGE_SET_ORDERED)
 		btrfs_page_clamp_set_ordered(fs_info, page, start, len);
-	if (page_ops & PAGE_SET_ERROR)
-		btrfs_page_clamp_set_error(fs_info, page, start, len);
 	if (page_ops & PAGE_START_WRITEBACK) {
 		btrfs_page_clamp_clear_dirty(fs_info, page, start, len);
 		btrfs_page_clamp_set_writeback(fs_info, page, start, len);
@@ -497,12 +495,10 @@  static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 	ASSERT(page_offset(page) <= start &&
 	       start + len <= page_offset(page) + PAGE_SIZE);
 
-	if (uptodate && btrfs_verify_page(page, start)) {
+	if (uptodate && btrfs_verify_page(page, start))
 		btrfs_page_set_uptodate(fs_info, page, start, len);
-	} else {
+	else
 		btrfs_page_clear_uptodate(fs_info, page, start, len);
-		btrfs_page_set_error(fs_info, page, start, len);
-	}
 
 	if (!btrfs_is_subpage(fs_info, page))
 		unlock_page(page);
@@ -530,7 +526,6 @@  void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
 		len = end + 1 - start;
 
 		btrfs_page_clear_uptodate(fs_info, page, start, len);
-		btrfs_page_set_error(fs_info, page, start, len);
 		ret = err < 0 ? err : -EIO;
 		mapping_set_error(page->mapping, ret);
 	}
@@ -1059,7 +1054,6 @@  static int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 	ret = set_page_extent_mapped(page);
 	if (ret < 0) {
 		unlock_extent(tree, start, end, NULL);
-		btrfs_page_set_error(fs_info, page, start, PAGE_SIZE);
 		unlock_page(page);
 		return ret;
 	}
@@ -1263,11 +1257,9 @@  static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 		}
 		ret = btrfs_run_delalloc_range(inode, page, delalloc_start,
 				delalloc_end, &page_started, &nr_written, wbc);
-		if (ret) {
-			btrfs_page_set_error(inode->root->fs_info, page,
-					     page_offset(page), PAGE_SIZE);
+		if (ret)
 			return ret;
-		}
+
 		/*
 		 * delalloc_end is already one less than the total length, so
 		 * we don't subtract one from PAGE_SIZE
@@ -1420,7 +1412,6 @@  static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 
 		em = btrfs_get_extent(inode, NULL, 0, cur, end - cur + 1);
 		if (IS_ERR(em)) {
-			btrfs_page_set_error(fs_info, page, cur, end - cur + 1);
 			ret = PTR_ERR_OR_ZERO(em);
 			goto out_error;
 		}
@@ -1519,9 +1510,6 @@  static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 
 	WARN_ON(!PageLocked(page));
 
-	btrfs_page_clear_error(btrfs_sb(inode->i_sb), page,
-			       page_offset(page), PAGE_SIZE);
-
 	pg_offset = offset_in_page(i_size);
 	if (page->index > end_index ||
 	   (page->index == end_index && !pg_offset)) {
@@ -1534,10 +1522,8 @@  static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 		memzero_page(page, pg_offset, PAGE_SIZE - pg_offset);
 
 	ret = set_page_extent_mapped(page);
-	if (ret < 0) {
-		SetPageError(page);
+	if (ret < 0)
 		goto done;
-	}
 
 	if (!bio_ctrl->extent_locked) {
 		ret = writepage_delalloc(BTRFS_I(inode), page, bio_ctrl->wbc);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c4d4ac0428ee74..35b99fde75abb1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1153,8 +1153,6 @@  static int submit_uncompressed_range(struct btrfs_inode *inode,
 			const u64 page_start = page_offset(locked_page);
 			const u64 page_end = page_start + PAGE_SIZE - 1;
 
-			btrfs_page_set_error(inode->root->fs_info, locked_page,
-					     page_start, PAGE_SIZE);
 			set_page_writeback(locked_page);
 			end_page_writeback(locked_page);
 			end_extent_writepage(locked_page, ret, page_start, page_end);
@@ -3028,7 +3026,6 @@  static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
 		mapping_set_error(page->mapping, ret);
 		end_extent_writepage(page, ret, page_start, page_end);
 		clear_page_dirty_for_io(page);
-		SetPageError(page);
 	}
 	btrfs_page_clear_checked(inode->root->fs_info, page, page_start, PAGE_SIZE);
 	unlock_page(page);
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 045117ca0ddc43..9e9a5e26a15736 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -100,9 +100,6 @@  void btrfs_init_subpage_info(struct btrfs_subpage_info *subpage_info, u32 sector
 	subpage_info->uptodate_offset = cur;
 	cur += nr_bits;
 
-	subpage_info->error_offset = cur;
-	cur += nr_bits;
-
 	subpage_info->dirty_offset = cur;
 	cur += nr_bits;
 
@@ -416,35 +413,6 @@  void btrfs_subpage_clear_uptodate(const struct btrfs_fs_info *fs_info,
 	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
-void btrfs_subpage_set_error(const struct btrfs_fs_info *fs_info,
-		struct page *page, u64 start, u32 len)
-{
-	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
-	unsigned int start_bit = subpage_calc_start_bit(fs_info, page,
-							error, start, len);
-	unsigned long flags;
-
-	spin_lock_irqsave(&subpage->lock, flags);
-	bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-	SetPageError(page);
-	spin_unlock_irqrestore(&subpage->lock, flags);
-}
-
-void btrfs_subpage_clear_error(const struct btrfs_fs_info *fs_info,
-		struct page *page, u64 start, u32 len)
-{
-	struct btrfs_subpage *subpage = (struct btrfs_subpage *)page->private;
-	unsigned int start_bit = subpage_calc_start_bit(fs_info, page,
-							error, start, len);
-	unsigned long flags;
-
-	spin_lock_irqsave(&subpage->lock, flags);
-	bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-	if (subpage_test_bitmap_all_zero(fs_info, subpage, error))
-		ClearPageError(page);
-	spin_unlock_irqrestore(&subpage->lock, flags);
-}
-
 void btrfs_subpage_set_dirty(const struct btrfs_fs_info *fs_info,
 		struct page *page, u64 start, u32 len)
 {
@@ -606,7 +574,6 @@  bool btrfs_subpage_test_##name(const struct btrfs_fs_info *fs_info,	\
 	return ret;							\
 }
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(uptodate);
-IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(error);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(dirty);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(writeback);
 IMPLEMENT_BTRFS_SUBPAGE_TEST_OP(ordered);
@@ -674,7 +641,6 @@  bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info,	\
 }
 IMPLEMENT_BTRFS_PAGE_OPS(uptodate, SetPageUptodate, ClearPageUptodate,
 			 PageUptodate);
-IMPLEMENT_BTRFS_PAGE_OPS(error, SetPageError, ClearPageError, PageError);
 IMPLEMENT_BTRFS_PAGE_OPS(dirty, set_page_dirty, clear_page_dirty_for_io,
 			 PageDirty);
 IMPLEMENT_BTRFS_PAGE_OPS(writeback, set_page_writeback, end_page_writeback,
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 0e80ad33690466..998c1b78066e53 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -8,17 +8,17 @@ 
 /*
  * Extra info for subpapge bitmap.
  *
- * For subpage we pack all uptodate/error/dirty/writeback/ordered bitmaps into
+ * For subpage we pack all uptodate/dirty/writeback/ordered bitmaps into
  * one larger bitmap.
  *
  * This structure records how they are organized in the bitmap:
  *
- * /- uptodate_offset	/- error_offset	/- dirty_offset
+ * /- uptodate_offset	/- dirty_offset	/- ordered_offset
  * |			|		|
  * v			v		v
- * |u|u|u|u|........|u|u|e|e|.......|e|e| ...	|o|o|
+ * |u|u|u|u|........|u|u|d|d|.......|d|d|o|o|.......|o|o|
  * |<- bitmap_nr_bits ->|
- * |<--------------- total_nr_bits ---------------->|
+ * |<----------------- total_nr_bits ------------------>|
  */
 struct btrfs_subpage_info {
 	/* Number of bits for each bitmap */
@@ -32,7 +32,6 @@  struct btrfs_subpage_info {
 	 * @bitmap_size, which is calculated from PAGE_SIZE / sectorsize.
 	 */
 	unsigned int uptodate_offset;
-	unsigned int error_offset;
 	unsigned int dirty_offset;
 	unsigned int writeback_offset;
 	unsigned int ordered_offset;
@@ -141,7 +140,6 @@  bool btrfs_page_clamp_test_##name(const struct btrfs_fs_info *fs_info,	\
 		struct page *page, u64 start, u32 len);
 
 DECLARE_BTRFS_SUBPAGE_OPS(uptodate);
-DECLARE_BTRFS_SUBPAGE_OPS(error);
 DECLARE_BTRFS_SUBPAGE_OPS(dirty);
 DECLARE_BTRFS_SUBPAGE_OPS(writeback);
 DECLARE_BTRFS_SUBPAGE_OPS(ordered);