Message ID | cover.1740635497.git.wqu@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: make subpage handling be feature full | expand |
On Thu, Feb 27, 2025 at 04:24:38PM +1030, Qu Wenruo wrote: > [CHANGLOG] > v2: > - Add a new bug fix which is exposed by recent 2K block size tests on > x86_64 > It's a self deadlock where the folio_end_writeback() is called with > subpage lock hold, and folio_end_writeback() will eventually call > btrfs_release_folio() and try to lock the same spin lock. > > Since the introduction of btrfs subapge support in v5.15, there are > quite some limitations: > > - No RAID56 support > Added in v5.19. > > - No memory efficient scrub support > Added in v6.4. > > - No block perfect compressed write support > Previously btrfs requires the dirty range to be fully page aligned, or > it will skip compression completely. > > Added in v6.13. > > - Various subpage related error handling fixes > Added in v6.14. > > - No support to create inline data extent > - No partial uptodate page support > This is a long existing optimization supported by EXT4/XFS and > is required to pass generic/563 with subpage block size. > > The last two are addressed in this patchset. That's great, thank you very much. I think not all patches have a reviewed-by tag but I'd like to get it to for-next very soon. The next is rc5 and we should have all features in. This also means the 2K nodesize block so we can give it more testing.
在 2025/2/27 21:46, David Sterba 写道: > On Thu, Feb 27, 2025 at 04:24:38PM +1030, Qu Wenruo wrote: >> [CHANGLOG] >> v2: >> - Add a new bug fix which is exposed by recent 2K block size tests on >> x86_64 >> It's a self deadlock where the folio_end_writeback() is called with >> subpage lock hold, and folio_end_writeback() will eventually call >> btrfs_release_folio() and try to lock the same spin lock. >> >> Since the introduction of btrfs subapge support in v5.15, there are >> quite some limitations: >> >> - No RAID56 support >> Added in v5.19. >> >> - No memory efficient scrub support >> Added in v6.4. >> >> - No block perfect compressed write support >> Previously btrfs requires the dirty range to be fully page aligned, or >> it will skip compression completely. >> >> Added in v6.13. >> >> - Various subpage related error handling fixes >> Added in v6.14. >> >> - No support to create inline data extent >> - No partial uptodate page support >> This is a long existing optimization supported by EXT4/XFS and >> is required to pass generic/563 with subpage block size. >> >> The last two are addressed in this patchset. > > That's great, thank you very much. I think not all patches have a > reviewed-by tag but I'd like to get it to for-next very soon. The next > is rc5 and we should have all features in. This also means the 2K > nodesize block so we can give it more testing. Now pushed to for-next, with all patches reviewed (or acked) by Filipe. Great thanks to Filipe for the review! For the 2K one, since it's just two small patches I'm also fine pushing them now. Just do not forget that we need progs patches, and a dozen of known failures from fstests, and I'm not yet able to address them all any time soon. Thanks, Qu
On Fri, Feb 28, 2025 at 01:44:04PM +1030, Qu Wenruo wrote: > For the 2K one, since it's just two small patches I'm also fine pushing > them now. > Just do not forget that we need progs patches, and a dozen of known > failures from fstests, and I'm not yet able to address them all any time > soon. Yeah, the mkfs support can go to the next minor progs release. About the status we can print a warning and document it. No need to focus on fixing the fstests, I think stress testing will be sufficient for now.
在 2025/2/28 23:12, David Sterba 写道: > On Fri, Feb 28, 2025 at 01:44:04PM +1030, Qu Wenruo wrote: >> For the 2K one, since it's just two small patches I'm also fine pushing >> them now. >> Just do not forget that we need progs patches, and a dozen of known >> failures from fstests, and I'm not yet able to address them all any time >> soon. > > Yeah, the mkfs support can go to the next minor progs release. About the > status we can print a warning and document it. No need to focus on > fixing the fstests, I think stress testing will be sufficient for now. > It turns out that this will not be a smooth sailing. There is a very huge conflicts between our async extent and subpage handling for writeback. Our async extent mechanism can mark a folio writeback at any time. (At submission time we keep the range locked, and let the compression work happen in another thread). If we have two async extent inside the same folio, we will have the following race: (64K page size, 4K fs block size) 0 32K 64K |<- AE 1 ->|<- AE 2 ->| Thread A (AE 1) | Thread B (AE 2) --------------------------------------+------------------------------ submit_one_async_extent() | |- process_one_folio() | |- subpage_set_writeback() | | /* IO finished */ | end_compressed_writeback() | |- btrfs_folio_clear_writeback() | | /* this is the last writeback | | holder, should end the folio | | writeback flag */ | |- last = true | | | submit_one_async_extent() | | |- process_one_folio() | | |- subpage_set_writeback() | | | | /* IO finished */ | | end_compressed_writeback() | | |- btrfs_folio_clear_writeback() | | | /* Again the last holder */ | | |- last = true |- folio_end_writeback() | |- folio_end_writeback() This leads to two threads calling folio_end_writeback() on the same folio. This will eventually lead to VM_BUG_ON_FOLIO() or other problems. Furthermore we can not rely on the folio->private to do anything after folio_end_writeback() call. Because that call may unmap/invalidate the folio. What's worse is, the iomap's extra writeback accounting won't help. Iomap will hold one extra writeback count before submitting the blocks inside the folio, then reduce the writeback count after all blocks have been marked writeback (submitted). That solution requires that all the blocks inside the folio to be submitted and marked writeback at the same time. But our async extent breaks that requirement completely. So far I have no better solution, but to disable the block-perfect compression first, then introduce the same iomap's extra count solution. The proper solution is not only the iomap solution, but to make the async extent submission to mark the folios writeback. That will be quite some work (part of the iomap migration plan). Thanks, Qu