Message ID | b709ff69f5d190ec620b7e4a21530be08442bf4b.1694201483.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: don't clear uptodate on write errors | expand |
On Fri, Sep 08, 2023 at 03:31:39PM -0400, Josef Bacik wrote: > We have been consistently seeing hangs with generic/648 in our subpage > GitHub CI setup. This is a classic deadlock, we are calling > btrfs_read_folio() on a folio, which requires holding the folio lock on > the folio, and then finding a ordered extent that overlaps that range > and calling btrfs_start_ordered_extent(), which then tries to write out > the dirty page, which requires taking the folio lock and then we > deadlock. > > The hang happens because we're writing to range [1271750656, 1271767040), page > index [77621, 77622], and page 77621 is !Uptodate. It is also Dirty, so we call > btrfs_read_folio() for 77621 and which does btrfs_lock_and_flush_ordered_range() > for that range, and we find an ordered extent which is [1271644160, 1271746560), > page index [77615, 77621]. The page indexes overlap, but the actual bytes don't > overlap. We're holding the page lock for 77621, then call > btrfs_lock_and_flush_ordered_range() which tries to flush the dirty page, and > tries to lock 77621 again and then we deadlock. > > The byte ranges do not overlap, but with subpage support if we clear > uptodate on any portion of the page we mark the entire thing as not > uptodate. Right, the split state handling. > We have been clearing page uptodate on write errors, but no other file > system does this, and is in fact incorrect. This doesn't hurt us in the > !subpage case because we can't end up with overlapped ranges that don't > also overlap on the page. > > Fix this by not clearing uptodate when we have a write error. The only > thing we should be doing in this case is setting the mapping error and > carrying on. This makes it so we would no longer call > btrfs_read_folio() on the page as it's uptodate and eliminates the > deadlock. > > With this patch we're now able to make it through a full xfstests run on > our subpage blocksize vms. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> I've marked it for stable 6.1 but it does not apply cleanly due to cleanups, btrfs_page_clear_uptodate() was called from end_extent_writepage() and open coded in 9783e4deed72 ("btrfs: remove end_extent_writepage"). Any backport needs to be careful, obviously the call can be removed from the helper but this might not be the only place. Added to misc-next, thanks.
On 2023/9/9 05:01, Josef Bacik wrote: > We have been consistently seeing hangs with generic/648 in our subpage > GitHub CI setup. This is a classic deadlock, we are calling > btrfs_read_folio() on a folio, which requires holding the folio lock on > the folio, and then finding a ordered extent that overlaps that range > and calling btrfs_start_ordered_extent(), which then tries to write out > the dirty page, which requires taking the folio lock and then we > deadlock. Thanks for pinging this down, this is indeed a bug for subpage case only. Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > > The hang happens because we're writing to range [1271750656, 1271767040), page > index [77621, 77622], and page 77621 is !Uptodate. It is also Dirty, so we call > btrfs_read_folio() for 77621 and which does btrfs_lock_and_flush_ordered_range() > for that range, and we find an ordered extent which is [1271644160, 1271746560), > page index [77615, 77621]. The page indexes overlap, but the actual bytes don't > overlap. We're holding the page lock for 77621, then call > btrfs_lock_and_flush_ordered_range() which tries to flush the dirty page, and > tries to lock 77621 again and then we deadlock. > > The byte ranges do not overlap, but with subpage support if we clear > uptodate on any portion of the page we mark the entire thing as not > uptodate. > > We have been clearing page uptodate on write errors, but no other file > system does this, and is in fact incorrect. This doesn't hurt us in the > !subpage case because we can't end up with overlapped ranges that don't > also overlap on the page. > > Fix this by not clearing uptodate when we have a write error. The only > thing we should be doing in this case is setting the mapping error and > carrying on. This makes it so we would no longer call > btrfs_read_folio() on the page as it's uptodate and eliminates the > deadlock. > > With this patch we're now able to make it through a full xfstests run on > our subpage blocksize vms. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/extent_io.c | 9 +-------- > fs/btrfs/inode.c | 4 ---- > 2 files changed, 1 insertion(+), 12 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index ac3fca5a5e41..6954ae763b86 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -484,10 +484,8 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio) > bvec->bv_offset, bvec->bv_len); > > btrfs_finish_ordered_extent(bbio->ordered, page, start, len, !error); > - if (error) { > - btrfs_page_clear_uptodate(fs_info, page, start, len); > + if (error) > mapping_set_error(page->mapping, error); > - } > btrfs_page_clear_writeback(fs_info, page, start, len); > } > > @@ -1456,8 +1454,6 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl > if (ret) { > btrfs_mark_ordered_io_finished(BTRFS_I(inode), page, page_start, > PAGE_SIZE, !ret); > - btrfs_page_clear_uptodate(btrfs_sb(inode->i_sb), page, > - page_start, PAGE_SIZE); > mapping_set_error(page->mapping, ret); > } > unlock_page(page); > @@ -1624,8 +1620,6 @@ static void extent_buffer_write_end_io(struct btrfs_bio *bbio) > struct page *page = bvec->bv_page; > u32 len = bvec->bv_len; > > - if (!uptodate) > - btrfs_page_clear_uptodate(fs_info, page, start, len); > btrfs_page_clear_writeback(fs_info, page, start, len); > bio_offset += len; > } > @@ -2201,7 +2195,6 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page, > if (ret) { > btrfs_mark_ordered_io_finished(BTRFS_I(inode), page, > cur, cur_len, !ret); > - btrfs_page_clear_uptodate(fs_info, page, cur, cur_len); > mapping_set_error(page->mapping, ret); > } > btrfs_page_unlock_writer(fs_info, page, cur, cur_len); > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index f09fbdc43f0f..478999dcb2a3 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1085,9 +1085,6 @@ static void submit_uncompressed_range(struct btrfs_inode *inode, > btrfs_mark_ordered_io_finished(inode, locked_page, > page_start, PAGE_SIZE, > !ret); > - btrfs_page_clear_uptodate(inode->root->fs_info, > - locked_page, page_start, > - PAGE_SIZE); > mapping_set_error(locked_page->mapping, ret); > unlock_page(locked_page); > } > @@ -2791,7 +2788,6 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) > mapping_set_error(page->mapping, ret); > btrfs_mark_ordered_io_finished(inode, page, page_start, > PAGE_SIZE, !ret); > - btrfs_page_clear_uptodate(fs_info, page, page_start, PAGE_SIZE); > clear_page_dirty_for_io(page); > } > btrfs_page_clear_checked(fs_info, page, page_start, PAGE_SIZE);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index ac3fca5a5e41..6954ae763b86 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -484,10 +484,8 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio) bvec->bv_offset, bvec->bv_len); btrfs_finish_ordered_extent(bbio->ordered, page, start, len, !error); - if (error) { - btrfs_page_clear_uptodate(fs_info, page, start, len); + if (error) mapping_set_error(page->mapping, error); - } btrfs_page_clear_writeback(fs_info, page, start, len); } @@ -1456,8 +1454,6 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl if (ret) { btrfs_mark_ordered_io_finished(BTRFS_I(inode), page, page_start, PAGE_SIZE, !ret); - btrfs_page_clear_uptodate(btrfs_sb(inode->i_sb), page, - page_start, PAGE_SIZE); mapping_set_error(page->mapping, ret); } unlock_page(page); @@ -1624,8 +1620,6 @@ static void extent_buffer_write_end_io(struct btrfs_bio *bbio) struct page *page = bvec->bv_page; u32 len = bvec->bv_len; - if (!uptodate) - btrfs_page_clear_uptodate(fs_info, page, start, len); btrfs_page_clear_writeback(fs_info, page, start, len); bio_offset += len; } @@ -2201,7 +2195,6 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page, if (ret) { btrfs_mark_ordered_io_finished(BTRFS_I(inode), page, cur, cur_len, !ret); - btrfs_page_clear_uptodate(fs_info, page, cur, cur_len); mapping_set_error(page->mapping, ret); } btrfs_page_unlock_writer(fs_info, page, cur, cur_len); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f09fbdc43f0f..478999dcb2a3 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1085,9 +1085,6 @@ static void submit_uncompressed_range(struct btrfs_inode *inode, btrfs_mark_ordered_io_finished(inode, locked_page, page_start, PAGE_SIZE, !ret); - btrfs_page_clear_uptodate(inode->root->fs_info, - locked_page, page_start, - PAGE_SIZE); mapping_set_error(locked_page->mapping, ret); unlock_page(locked_page); } @@ -2791,7 +2788,6 @@ static void btrfs_writepage_fixup_worker(struct btrfs_work *work) mapping_set_error(page->mapping, ret); btrfs_mark_ordered_io_finished(inode, page, page_start, PAGE_SIZE, !ret); - btrfs_page_clear_uptodate(fs_info, page, page_start, PAGE_SIZE); clear_page_dirty_for_io(page); } btrfs_page_clear_checked(fs_info, page, page_start, PAGE_SIZE);
We have been consistently seeing hangs with generic/648 in our subpage GitHub CI setup. This is a classic deadlock, we are calling btrfs_read_folio() on a folio, which requires holding the folio lock on the folio, and then finding a ordered extent that overlaps that range and calling btrfs_start_ordered_extent(), which then tries to write out the dirty page, which requires taking the folio lock and then we deadlock. The hang happens because we're writing to range [1271750656, 1271767040), page index [77621, 77622], and page 77621 is !Uptodate. It is also Dirty, so we call btrfs_read_folio() for 77621 and which does btrfs_lock_and_flush_ordered_range() for that range, and we find an ordered extent which is [1271644160, 1271746560), page index [77615, 77621]. The page indexes overlap, but the actual bytes don't overlap. We're holding the page lock for 77621, then call btrfs_lock_and_flush_ordered_range() which tries to flush the dirty page, and tries to lock 77621 again and then we deadlock. The byte ranges do not overlap, but with subpage support if we clear uptodate on any portion of the page we mark the entire thing as not uptodate. We have been clearing page uptodate on write errors, but no other file system does this, and is in fact incorrect. This doesn't hurt us in the !subpage case because we can't end up with overlapped ranges that don't also overlap on the page. Fix this by not clearing uptodate when we have a write error. The only thing we should be doing in this case is setting the mapping error and carrying on. This makes it so we would no longer call btrfs_read_folio() on the page as it's uptodate and eliminates the deadlock. With this patch we're now able to make it through a full xfstests run on our subpage blocksize vms. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/extent_io.c | 9 +-------- fs/btrfs/inode.c | 4 ---- 2 files changed, 1 insertion(+), 12 deletions(-)