mbox series

[v3,0/8] btrfs: make subpage handling be feature full

Message ID cover.1740990125.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs: make subpage handling be feature full | expand

Message

Qu Wenruo March 3, 2025, 8:35 a.m. UTC
[CHANGLOG]
v3:
- Drop the btrfs uncached write support
  It turns out that we can not move the folio_end_writeback() call out
  of the spinlock for now.

  Two or more async extents can race on the same folio to call
  folio_end_writeback() if not protected by a spinlock, can be
  reproduced by generic/127 run loops with experimental 2k block size
  patchset.

  Thus disabling uncached write is the smallest fix for now, and drop
  the previous calling-folio_end_writeback()-out-of-spinlock fix.

  Not sure if a full revert is better or not, there is still some valid
  FGP flag related cleanup in that commit.

v2:
- Add a new bug fix which is exposed by recent 2K block size tests on
  x86_64
  It's a self deadlock where the folio_end_writeback() is called with
  subpage lock hold, and folio_end_writeback() will eventually call
  btrfs_release_folio() and try to lock the same spin lock.

- Various grammar fixes and commit message/comments updates

- Address comments from Filipe and adds his tags

v1:
- Merge previous two patches into one
- Re-order the patches so preparation/fixes are always before feature
  enablement
- Update the commit message of the first patch
  So that we do not focus on the out-of-tree part, but explain why it's
  not ideal in the first place (double page zeroing), and just mention
  the behavior change in the future.

Since the introduction of btrfs subapge support in v5.15, there are
quite some limitations:

- No RAID56 support
  Added in v5.19.

- No memory efficient scrub support
  Added in v6.4.

- No block perfect compressed write support
  Previously btrfs requires the dirty range to be fully page aligned, or
  it will skip compression completely.

  Added in v6.13.

- Various subpage related error handling fixes
  Added in v6.14.

- No support to create inline data extent
- No partial uptodate page support
  This is a long existing optimization supported by EXT4/XFS and
  is required to pass generic/563 with subpage block size.

The last two are addressed in this patchset.

And since all features are implemented for subpage cases, the long
existing experimental warning message can be dropped, as it is already
causing a lot of concerns for users who checks the dmesg carefully.

Qu Wenruo (8):
  btrfs: disable uncached writes for now
  btrfs: prevent inline data extents read from touching blocks beyond
    its range
  btrfs: fix the qgroup data free range for inline data extents
  btrfs: introduce a read path dedicated extent lock helper
  btrfs: make btrfs_do_readpage() to do block-by-block read
  btrfs: allow buffered write to avoid full page read if it's block
    aligned
  btrfs: allow inline data extents creation if block size < page size
  btrfs: remove the subpage related warning message

 fs/btrfs/disk-io.c      |   5 -
 fs/btrfs/extent_io.c    | 228 +++++++++++++++++++++++++++++++++++-----
 fs/btrfs/file.c         |  18 +++-
 fs/btrfs/inode.c        |  29 ++---
 fs/btrfs/ordered-data.c |  23 +++-
 fs/btrfs/ordered-data.h |   8 +-
 6 files changed, 248 insertions(+), 63 deletions(-)

Comments

David Sterba March 3, 2025, 9:01 a.m. UTC | #1
On Mon, Mar 03, 2025 at 07:05:08PM +1030, Qu Wenruo wrote:
> [CHANGLOG]
> v3:
> - Drop the btrfs uncached write support
>   It turns out that we can not move the folio_end_writeback() call out
>   of the spinlock for now.
> 
>   Two or more async extents can race on the same folio to call
>   folio_end_writeback() if not protected by a spinlock, can be
>   reproduced by generic/127 run loops with experimental 2k block size
>   patchset.
> 
>   Thus disabling uncached write is the smallest fix for now, and drop
>   the previous calling-folio_end_writeback()-out-of-spinlock fix.
> 
>   Not sure if a full revert is better or not, there is still some valid
>   FGP flag related cleanup in that commit.

Let's keep it, the more compelling reason is that we want uncached
writes so we'll have the groundwork for that and ready for testing.

Do you intend to push v3 to for-next for 6.15? I know you removed the
v2, we can postpone it if there are more potential problems.
Qu Wenruo March 3, 2025, 9:19 a.m. UTC | #2
在 2025/3/3 19:31, David Sterba 写道:
> On Mon, Mar 03, 2025 at 07:05:08PM +1030, Qu Wenruo wrote:
>> [CHANGLOG]
>> v3:
>> - Drop the btrfs uncached write support
>>    It turns out that we can not move the folio_end_writeback() call out
>>    of the spinlock for now.
>>
>>    Two or more async extents can race on the same folio to call
>>    folio_end_writeback() if not protected by a spinlock, can be
>>    reproduced by generic/127 run loops with experimental 2k block size
>>    patchset.
>>
>>    Thus disabling uncached write is the smallest fix for now, and drop
>>    the previous calling-folio_end_writeback()-out-of-spinlock fix.
>>
>>    Not sure if a full revert is better or not, there is still some valid
>>    FGP flag related cleanup in that commit.
>
> Let's keep it, the more compelling reason is that we want uncached
> writes so we'll have the groundwork for that and ready for testing.
>
> Do you intend to push v3 to for-next for 6.15? I know you removed the
> v2, we can postpone it if there are more potential problems.
>

So far my local tests on both aarch64 (the usual test VM, 64K page size
and 4K block size) and x86_64 (with experimental 2K block size) shows no
problem so far.

And this time I have looped generic/127 for extra safety, so I guess it
should be fine this time.

Last but not the least, the analyze can explain all the writeback
related bugs I hit so far.

I'll take one or two days to shake out extra problems before pushing the
whole series (with 2K block size support) to for-next.

Thanks,
Qu