Message ID | 305e53f3931e164c8ab3b09c059ea68bd792a4b2.1728256845.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: fix the delalloc range locking if sector size < page size | expand |
On Mon, Oct 07, 2024 at 09:51:40AM +1030, Qu Wenruo wrote: > Inside lock_delalloc_folios(), there are several problems related to > sector size < page size handling: > > - Set the writer locks without checking if the folio is still valid > We call btrfs_folio_start_writer_lock() just like it's folio_lock(). > But since the folio may not even be the folio of the current mapping, > we can easily screw up the folio->private. > > - The range is not clampped inside the page > This means we can over write other bitmaps if the start/len is not > properly handled, and trigger the btrfs_subpage_assert(). > > - @processed_end is always rounded up to page end > If the delalloc range is not page aligned, and we need to retry > (returning -EAGAIN), then we will unlock to the page end. > > Thankfully this is not a huge problem, as now > btrfs_folio_end_writer_lock() can handle range larger than the locked > range, and only unlock what is already locked. > > Fix all these problems by: > > - Lock and check the folio first, then call > btrfs_folio_set_writer_lock() > So that if we got a folio not belonging to the inode, we won't > touch folio->private. > > - Properly truncate the range inside the page > > - Update @processed_end to the locked range end > > - Remove btrfs_folio_start_writer_lock() > Since all callsites only utilize btrfs_folio_set_writer_lock() now, > remove the unnecessary btrfs_folio_start_writer_lock(). > > Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking") This commit is in 5.14, so we'd need backport to 5.15+. The v1 would be more suitable as it does not remove the unused btrfs_folio_start_writer_lock(). As it's exported function there should be no warning. > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > Changelog: > v2: > - Remove the unused btrfs_folio_start_writer_lock() function > --- > fs/btrfs/extent_io.c | 15 +++++++------- > fs/btrfs/subpage.c | 47 -------------------------------------------- > fs/btrfs/subpage.h | 2 -- > 3 files changed, 7 insertions(+), 57 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index e0b43640849e..0448cee2b983 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -262,22 +262,21 @@ static noinline int lock_delalloc_folios(struct inode *inode, > > for (i = 0; i < found_folios; i++) { > struct folio *folio = fbatch.folios[i]; > - u32 len = end + 1 - start; > + u64 range_start = max_t(u64, folio_pos(folio), start); > + u32 range_len = min_t(u64, folio_pos(folio) + folio_size(folio), > + end + 1) - range_start; > > if (folio == locked_folio) > continue; > > - if (btrfs_folio_start_writer_lock(fs_info, folio, start, > - len)) > - goto out; > - > + folio_lock(folio); > if (!folio_test_dirty(folio) || folio->mapping != mapping) { Should the range_start and range_end be set after folio_lock? One of the prolblems you're fixing is the folio and mapping mismatch, if I read it correctly.
在 2024/10/9 03:28, David Sterba 写道: > On Mon, Oct 07, 2024 at 09:51:40AM +1030, Qu Wenruo wrote: >> Inside lock_delalloc_folios(), there are several problems related to >> sector size < page size handling: >> >> - Set the writer locks without checking if the folio is still valid >> We call btrfs_folio_start_writer_lock() just like it's folio_lock(). >> But since the folio may not even be the folio of the current mapping, >> we can easily screw up the folio->private. >> >> - The range is not clampped inside the page >> This means we can over write other bitmaps if the start/len is not >> properly handled, and trigger the btrfs_subpage_assert(). >> >> - @processed_end is always rounded up to page end >> If the delalloc range is not page aligned, and we need to retry >> (returning -EAGAIN), then we will unlock to the page end. >> >> Thankfully this is not a huge problem, as now >> btrfs_folio_end_writer_lock() can handle range larger than the locked >> range, and only unlock what is already locked. >> >> Fix all these problems by: >> >> - Lock and check the folio first, then call >> btrfs_folio_set_writer_lock() >> So that if we got a folio not belonging to the inode, we won't >> touch folio->private. >> >> - Properly truncate the range inside the page >> >> - Update @processed_end to the locked range end >> >> - Remove btrfs_folio_start_writer_lock() >> Since all callsites only utilize btrfs_folio_set_writer_lock() now, >> remove the unnecessary btrfs_folio_start_writer_lock(). >> >> Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking") > > This commit is in 5.14, so we'd need backport to 5.15+. The v1 would be > more suitable as it does not remove the unused > btrfs_folio_start_writer_lock(). As it's exported function there should > be no warning. > >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> Changelog: >> v2: >> - Remove the unused btrfs_folio_start_writer_lock() function >> --- >> fs/btrfs/extent_io.c | 15 +++++++------- >> fs/btrfs/subpage.c | 47 -------------------------------------------- >> fs/btrfs/subpage.h | 2 -- >> 3 files changed, 7 insertions(+), 57 deletions(-) >> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index e0b43640849e..0448cee2b983 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -262,22 +262,21 @@ static noinline int lock_delalloc_folios(struct inode *inode, >> >> for (i = 0; i < found_folios; i++) { >> struct folio *folio = fbatch.folios[i]; >> - u32 len = end + 1 - start; >> + u64 range_start = max_t(u64, folio_pos(folio), start); >> + u32 range_len = min_t(u64, folio_pos(folio) + folio_size(folio), >> + end + 1) - range_start; >> >> if (folio == locked_folio) >> continue; >> >> - if (btrfs_folio_start_writer_lock(fs_info, folio, start, >> - len)) >> - goto out; >> - >> + folio_lock(folio); >> if (!folio_test_dirty(folio) || folio->mapping != mapping) { > > Should the range_start and range_end be set after folio_lock? One of the > prolblems you're fixing is the folio and mapping mismatch, if I read it > correctly. > Nice catch, indeed that's the problem. Will fix it and split out the cleanup in the next version. Thanks, Qu
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e0b43640849e..0448cee2b983 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -262,22 +262,21 @@ static noinline int lock_delalloc_folios(struct inode *inode, for (i = 0; i < found_folios; i++) { struct folio *folio = fbatch.folios[i]; - u32 len = end + 1 - start; + u64 range_start = max_t(u64, folio_pos(folio), start); + u32 range_len = min_t(u64, folio_pos(folio) + folio_size(folio), + end + 1) - range_start; if (folio == locked_folio) continue; - if (btrfs_folio_start_writer_lock(fs_info, folio, start, - len)) - goto out; - + folio_lock(folio); if (!folio_test_dirty(folio) || folio->mapping != mapping) { - btrfs_folio_end_writer_lock(fs_info, folio, start, - len); + folio_unlock(folio); goto out; } + btrfs_folio_set_writer_lock(fs_info, folio, range_start, range_len); - processed_end = folio_pos(folio) + folio_size(folio) - 1; + processed_end = range_start + range_len - 1; } folio_batch_release(&fbatch); cond_resched(); diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c index f98ce74b450e..3c1e34ddda74 100644 --- a/fs/btrfs/subpage.c +++ b/fs/btrfs/subpage.c @@ -333,26 +333,6 @@ static void btrfs_subpage_clamp_range(struct folio *folio, u64 *start, u32 *len) orig_start + orig_len) - *start; } -static void btrfs_subpage_start_writer(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len) -{ - struct btrfs_subpage *subpage = folio_get_private(folio); - const int start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len); - const int nbits = (len >> fs_info->sectorsize_bits); - unsigned long flags; - int ret; - - btrfs_subpage_assert(fs_info, folio, start, len); - - spin_lock_irqsave(&subpage->lock, flags); - ASSERT(atomic_read(&subpage->readers) == 0); - assert_bitmap_range_all_zero(fs_info, folio, start, len); - bitmap_set(subpage->bitmaps, start_bit, nbits); - ret = atomic_add_return(nbits, &subpage->writers); - ASSERT(ret == nbits); - spin_unlock_irqrestore(&subpage->lock, flags); -} - static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len) { @@ -389,33 +369,6 @@ static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_inf return last; } -/* - * Lock a folio for delalloc page writeback. - * - * Return -EAGAIN if the page is not properly initialized. - * Return 0 with the page locked, and writer counter updated. - * - * Even with 0 returned, the page still need extra check to make sure - * it's really the correct page, as the caller is using - * filemap_get_folios_contig(), which can race with page invalidating. - */ -int btrfs_folio_start_writer_lock(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len) -{ - if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio->mapping)) { - folio_lock(folio); - return 0; - } - folio_lock(folio); - if (!folio_test_private(folio) || !folio_get_private(folio)) { - folio_unlock(folio); - return -EAGAIN; - } - btrfs_subpage_clamp_range(folio, &start, &len); - btrfs_subpage_start_writer(fs_info, folio, start, len); - return 0; -} - /* * Handle different locked folios: * diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h index 48a481b327ca..cbee866152de 100644 --- a/fs/btrfs/subpage.h +++ b/fs/btrfs/subpage.h @@ -100,8 +100,6 @@ void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info, void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len); -int btrfs_folio_start_writer_lock(const struct btrfs_fs_info *fs_info, - struct folio *folio, u64 start, u32 len); void btrfs_folio_end_writer_lock(const struct btrfs_fs_info *fs_info, struct folio *folio, u64 start, u32 len); void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info,
Inside lock_delalloc_folios(), there are several problems related to sector size < page size handling: - Set the writer locks without checking if the folio is still valid We call btrfs_folio_start_writer_lock() just like it's folio_lock(). But since the folio may not even be the folio of the current mapping, we can easily screw up the folio->private. - The range is not clampped inside the page This means we can over write other bitmaps if the start/len is not properly handled, and trigger the btrfs_subpage_assert(). - @processed_end is always rounded up to page end If the delalloc range is not page aligned, and we need to retry (returning -EAGAIN), then we will unlock to the page end. Thankfully this is not a huge problem, as now btrfs_folio_end_writer_lock() can handle range larger than the locked range, and only unlock what is already locked. Fix all these problems by: - Lock and check the folio first, then call btrfs_folio_set_writer_lock() So that if we got a folio not belonging to the inode, we won't touch folio->private. - Properly truncate the range inside the page - Update @processed_end to the locked range end - Remove btrfs_folio_start_writer_lock() Since all callsites only utilize btrfs_folio_set_writer_lock() now, remove the unnecessary btrfs_folio_start_writer_lock(). Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking") Signed-off-by: Qu Wenruo <wqu@suse.com> --- Changelog: v2: - Remove the unused btrfs_folio_start_writer_lock() function --- fs/btrfs/extent_io.c | 15 +++++++------- fs/btrfs/subpage.c | 47 -------------------------------------------- fs/btrfs/subpage.h | 2 -- 3 files changed, 7 insertions(+), 57 deletions(-)