Message ID | 2faab8a96c6dd2a414a96e4cebae97ecbddf021d.1730269807.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: sector size < page size enhancement | expand |
I know this is part of the subpage patches, but this is really a bug fix for the existing subpage handling. Appreciate if anyone can give this a review. Thanks, Qu 在 2024/10/30 17:03, Qu Wenruo 写道: > [BUG] > Btrfs will fail generic/750 randomly if its sector size is smaller than > page size. > > One of the warning looks like this: > > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 90263 at fs/btrfs/ordered-data.c:360 can_finish_ordered_extent+0x33c/0x390 [btrfs] > CPU: 1 UID: 0 PID: 90263 Comm: kworker/u18:1 Tainted: G OE 6.12.0-rc3-custom+ #79 > Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] > pc : can_finish_ordered_extent+0x33c/0x390 [btrfs] > lr : can_finish_ordered_extent+0xdc/0x390 [btrfs] > Call trace: > can_finish_ordered_extent+0x33c/0x390 [btrfs] > btrfs_mark_ordered_io_finished+0x130/0x2b8 [btrfs] > extent_writepage+0xfc/0x338 [btrfs] > extent_write_cache_pages+0x1d4/0x4b8 [btrfs] > btrfs_writepages+0x94/0x158 [btrfs] > do_writepages+0x74/0x190 > filemap_fdatawrite_wbc+0x88/0xc8 > start_delalloc_inodes+0x180/0x3b0 [btrfs] > btrfs_start_delalloc_roots+0x17c/0x288 [btrfs] > shrink_delalloc+0x11c/0x280 [btrfs] > flush_space+0x27c/0x310 [btrfs] > btrfs_async_reclaim_metadata_space+0xcc/0x208 [btrfs] > process_one_work+0x228/0x670 > worker_thread+0x1bc/0x360 > kthread+0x100/0x118 > ret_from_fork+0x10/0x20 > irq event stamp: 9784200 > hardirqs last enabled at (9784199): [<ffffd21ec54dc01c>] _raw_spin_unlock_irqrestore+0x74/0x80 > hardirqs last disabled at (9784200): [<ffffd21ec54db374>] _raw_spin_lock_irqsave+0x8c/0xa0 > softirqs last enabled at (9784148): [<ffffd21ec472ff44>] handle_softirqs+0x45c/0x4b0 > softirqs last disabled at (9784141): [<ffffd21ec46d01e4>] __do_softirq+0x1c/0x28 > ---[ end trace 0000000000000000 ]--- > BTRFS critical (device dm-2): bad ordered extent accounting, root=5 ino=1492 OE offset=1654784 OE len=57344 to_dec=49152 left=0 > > [CAUSE] > The function btrfs_mark_ordered_io_finished() is called for marking all > ordered extents in the page range as finished, for error handling. > > But for sector size < page size cases, we can have multiple ordered > extents in one page. > > If extent_writepage_io() failed (the only possible case is > submit_one_sector() failed to grab an extent map), then the call site > inside extent_writepage() will call btrfs_mark_ordered_io_finished() to > finish the created ordered extents. > > However some range of the ordered extent may have been submitted already, > then btrfs_mark_ordered_io_finished() is called on the same range, causing > double accounting. > > [FIX] > - Introduce a new member btrfs_bio_ctrl::last_submitted > This will trace the last sector submitted through > extent_writepage_io(). > > So for the above extent_writepage() case, we will know exactly which > sectors are submitted and should not do the ordered extent accounting. > > - Introduce a helper cleanup_ordered_extents() > This will do a sector-by-sector cleanup with > btrfs_bio_ctrl::last_submitted and btrfs_bio_ctrl::submit_bitmap into > consideartion. > > Using @last_submitted is to avoid double accounting on the submitted > ranges. > Meanwhile using @submit_bitmap is to avoid touching ranges going > through compression. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/extent_io.c | 41 +++++++++++++++++++++++++++++++++++++---- > 1 file changed, 37 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index e629d2ee152a..427bfbe737f2 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -108,6 +108,14 @@ struct btrfs_bio_ctrl { > * This is to avoid touching ranges covered by compression/inline. > */ > unsigned long submit_bitmap; > + > + /* > + * The end (exclusive) of the last submitted range in the folio. > + * > + * This is for sector size < page size case where we may hit error > + * half way. > + */ > + u64 last_submitted; > }; > > static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl) > @@ -1435,6 +1443,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, > ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size); > if (ret < 0) > goto out; > + bio_ctrl->last_submitted = cur + fs_info->sectorsize; > submitted_io = true; > } > out: > @@ -1453,6 +1462,24 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, > return ret; > } > > +static void cleanup_ordered_extents(struct btrfs_inode *inode, > + struct folio *folio, u64 file_pos, > + u64 num_bytes, unsigned long *bitmap) > +{ > + struct btrfs_fs_info *fs_info = inode->root->fs_info; > + unsigned int cur_bit = (file_pos - folio_pos(folio)) >> fs_info->sectorsize_bits; > + > + for_each_set_bit_from(cur_bit, bitmap, fs_info->sectors_per_page) { > + u64 cur_pos = folio_pos(folio) + (cur_bit << fs_info->sectorsize_bits); > + > + if (cur_pos >= file_pos + num_bytes) > + break; > + > + btrfs_mark_ordered_io_finished(inode, folio, cur_pos, > + fs_info->sectorsize, false); > + } > +} > + > /* > * the writepage semantics are similar to regular writepage. extent > * records are inserted to lock ranges in the tree, and as dirty areas > @@ -1492,6 +1519,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl > * The proper bitmap can only be initialized until writepage_delalloc(). > */ > bio_ctrl->submit_bitmap = (unsigned long)-1; > + bio_ctrl->last_submitted = page_start; > ret = set_folio_extent_mapped(folio); > if (ret < 0) > goto done; > @@ -1511,8 +1539,10 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl > > done: > if (ret) { > - btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, > - page_start, PAGE_SIZE, !ret); > + cleanup_ordered_extents(BTRFS_I(inode), folio, > + bio_ctrl->last_submitted, > + page_start + PAGE_SIZE - bio_ctrl->last_submitted, > + &bio_ctrl->submit_bitmap); > mapping_set_error(folio->mapping, ret); > } > > @@ -2288,14 +2318,17 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f > * extent_writepage_io() will do the truncation correctly. > */ > bio_ctrl.submit_bitmap = (unsigned long)-1; > + bio_ctrl.last_submitted = cur; > ret = extent_writepage_io(BTRFS_I(inode), folio, cur, cur_len, > &bio_ctrl, i_size); > if (ret == 1) > goto next_page; > > if (ret) { > - btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, > - cur, cur_len, !ret); > + cleanup_ordered_extents(BTRFS_I(inode), folio, > + bio_ctrl.last_submitted, > + cur_end + 1 - bio_ctrl.last_submitted, > + &bio_ctrl.submit_bitmap); > mapping_set_error(mapping, ret); > } > btrfs_folio_end_lock(fs_info, folio, cur, cur_len);
On Sun, Nov 24, 2024 at 06:01:27PM +1030, Qu Wenruo wrote: > I know this is part of the subpage patches, but this is really a bug fix > for the existing subpage handling. > > Appreciate if anyone can give this a review. Looks correct to me. One suggestion to clean up the parameters and to pass bio_ctrl and read the last_sibmitted and the bitmap directly, so something like that: cleanup_ordered_extents(BTRFS_I(inode), folio, &bio_ctrl, cur_end + 1); replacing the parameters with the values in the function. Though after another thought, the explicit expressions like "page_start + PAGE_SIZE - bio_ctrl->last_submitted" and "cur_end + 1 - bio_ctrl.last_submitted" make it a bit readable. Up to you.
在 2024/11/27 02:38, David Sterba 写道: > On Sun, Nov 24, 2024 at 06:01:27PM +1030, Qu Wenruo wrote: >> I know this is part of the subpage patches, but this is really a bug fix >> for the existing subpage handling. >> >> Appreciate if anyone can give this a review. > > Looks correct to me. One suggestion to clean up the parameters and to > pass bio_ctrl and read the last_sibmitted and the bitmap directly, so > something like that: > > cleanup_ordered_extents(BTRFS_I(inode), folio, &bio_ctrl, cur_end + 1); > > replacing the parameters with the values in the function. Though after > another thought, the explicit expressions like > "page_start + PAGE_SIZE - bio_ctrl->last_submitted" > and "cur_end + 1 - bio_ctrl.last_submitted" make it a bit readable. Up > to you. This one is replaced by this series: https://lore.kernel.org/linux-btrfs/cover.1732596971.git.wqu@suse.com/ However I'm still hitting hangs where some ordered extent never finishes. (At least better than crash, but still not ideal) Thanks, Qu
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e629d2ee152a..427bfbe737f2 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -108,6 +108,14 @@ struct btrfs_bio_ctrl { * This is to avoid touching ranges covered by compression/inline. */ unsigned long submit_bitmap; + + /* + * The end (exclusive) of the last submitted range in the folio. + * + * This is for sector size < page size case where we may hit error + * half way. + */ + u64 last_submitted; }; static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl) @@ -1435,6 +1443,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size); if (ret < 0) goto out; + bio_ctrl->last_submitted = cur + fs_info->sectorsize; submitted_io = true; } out: @@ -1453,6 +1462,24 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode, return ret; } +static void cleanup_ordered_extents(struct btrfs_inode *inode, + struct folio *folio, u64 file_pos, + u64 num_bytes, unsigned long *bitmap) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + unsigned int cur_bit = (file_pos - folio_pos(folio)) >> fs_info->sectorsize_bits; + + for_each_set_bit_from(cur_bit, bitmap, fs_info->sectors_per_page) { + u64 cur_pos = folio_pos(folio) + (cur_bit << fs_info->sectorsize_bits); + + if (cur_pos >= file_pos + num_bytes) + break; + + btrfs_mark_ordered_io_finished(inode, folio, cur_pos, + fs_info->sectorsize, false); + } +} + /* * the writepage semantics are similar to regular writepage. extent * records are inserted to lock ranges in the tree, and as dirty areas @@ -1492,6 +1519,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl * The proper bitmap can only be initialized until writepage_delalloc(). */ bio_ctrl->submit_bitmap = (unsigned long)-1; + bio_ctrl->last_submitted = page_start; ret = set_folio_extent_mapped(folio); if (ret < 0) goto done; @@ -1511,8 +1539,10 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl done: if (ret) { - btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, - page_start, PAGE_SIZE, !ret); + cleanup_ordered_extents(BTRFS_I(inode), folio, + bio_ctrl->last_submitted, + page_start + PAGE_SIZE - bio_ctrl->last_submitted, + &bio_ctrl->submit_bitmap); mapping_set_error(folio->mapping, ret); } @@ -2288,14 +2318,17 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f * extent_writepage_io() will do the truncation correctly. */ bio_ctrl.submit_bitmap = (unsigned long)-1; + bio_ctrl.last_submitted = cur; ret = extent_writepage_io(BTRFS_I(inode), folio, cur, cur_len, &bio_ctrl, i_size); if (ret == 1) goto next_page; if (ret) { - btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio, - cur, cur_len, !ret); + cleanup_ordered_extents(BTRFS_I(inode), folio, + bio_ctrl.last_submitted, + cur_end + 1 - bio_ctrl.last_submitted, + &bio_ctrl.submit_bitmap); mapping_set_error(mapping, ret); } btrfs_folio_end_lock(fs_info, folio, cur, cur_len);
[BUG] Btrfs will fail generic/750 randomly if its sector size is smaller than page size. One of the warning looks like this: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 90263 at fs/btrfs/ordered-data.c:360 can_finish_ordered_extent+0x33c/0x390 [btrfs] CPU: 1 UID: 0 PID: 90263 Comm: kworker/u18:1 Tainted: G OE 6.12.0-rc3-custom+ #79 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] pc : can_finish_ordered_extent+0x33c/0x390 [btrfs] lr : can_finish_ordered_extent+0xdc/0x390 [btrfs] Call trace: can_finish_ordered_extent+0x33c/0x390 [btrfs] btrfs_mark_ordered_io_finished+0x130/0x2b8 [btrfs] extent_writepage+0xfc/0x338 [btrfs] extent_write_cache_pages+0x1d4/0x4b8 [btrfs] btrfs_writepages+0x94/0x158 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x180/0x3b0 [btrfs] btrfs_start_delalloc_roots+0x17c/0x288 [btrfs] shrink_delalloc+0x11c/0x280 [btrfs] flush_space+0x27c/0x310 [btrfs] btrfs_async_reclaim_metadata_space+0xcc/0x208 [btrfs] process_one_work+0x228/0x670 worker_thread+0x1bc/0x360 kthread+0x100/0x118 ret_from_fork+0x10/0x20 irq event stamp: 9784200 hardirqs last enabled at (9784199): [<ffffd21ec54dc01c>] _raw_spin_unlock_irqrestore+0x74/0x80 hardirqs last disabled at (9784200): [<ffffd21ec54db374>] _raw_spin_lock_irqsave+0x8c/0xa0 softirqs last enabled at (9784148): [<ffffd21ec472ff44>] handle_softirqs+0x45c/0x4b0 softirqs last disabled at (9784141): [<ffffd21ec46d01e4>] __do_softirq+0x1c/0x28 ---[ end trace 0000000000000000 ]--- BTRFS critical (device dm-2): bad ordered extent accounting, root=5 ino=1492 OE offset=1654784 OE len=57344 to_dec=49152 left=0 [CAUSE] The function btrfs_mark_ordered_io_finished() is called for marking all ordered extents in the page range as finished, for error handling. But for sector size < page size cases, we can have multiple ordered extents in one page. If extent_writepage_io() failed (the only possible case is submit_one_sector() failed to grab an extent map), then the call site inside extent_writepage() will call btrfs_mark_ordered_io_finished() to finish the created ordered extents. However some range of the ordered extent may have been submitted already, then btrfs_mark_ordered_io_finished() is called on the same range, causing double accounting. [FIX] - Introduce a new member btrfs_bio_ctrl::last_submitted This will trace the last sector submitted through extent_writepage_io(). So for the above extent_writepage() case, we will know exactly which sectors are submitted and should not do the ordered extent accounting. - Introduce a helper cleanup_ordered_extents() This will do a sector-by-sector cleanup with btrfs_bio_ctrl::last_submitted and btrfs_bio_ctrl::submit_bitmap into consideartion. Using @last_submitted is to avoid double accounting on the submitted ranges. Meanwhile using @submit_bitmap is to avoid touching ranges going through compression. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 41 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-)