diff mbox series

[v2] btrfs: fix the delalloc range locking if sector size < page size

Message ID 305e53f3931e164c8ab3b09c059ea68bd792a4b2.1728256845.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] btrfs: fix the delalloc range locking if sector size < page size | expand

Commit Message

Qu Wenruo Oct. 6, 2024, 11:21 p.m. UTC
Inside lock_delalloc_folios(), there are several problems related to
sector size < page size handling:

- Set the writer locks without checking if the folio is still valid
  We call btrfs_folio_start_writer_lock() just like it's folio_lock().
  But since the folio may not even be the folio of the current mapping,
  we can easily screw up the folio->private.

- The range is not clampped inside the page
  This means we can over write other bitmaps if the start/len is not
  properly handled, and trigger the btrfs_subpage_assert().

- @processed_end is always rounded up to page end
  If the delalloc range is not page aligned, and we need to retry
  (returning -EAGAIN), then we will unlock to the page end.

  Thankfully this is not a huge problem, as now
  btrfs_folio_end_writer_lock() can handle range larger than the locked
  range, and only unlock what is already locked.

Fix all these problems by:

- Lock and check the folio first, then call
  btrfs_folio_set_writer_lock()
  So that if we got a folio not belonging to the inode, we won't
  touch folio->private.

- Properly truncate the range inside the page

- Update @processed_end to the locked range end

- Remove btrfs_folio_start_writer_lock()
  Since all callsites only utilize btrfs_folio_set_writer_lock() now,
  remove the unnecessary btrfs_folio_start_writer_lock().

Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Remove the unused btrfs_folio_start_writer_lock() function
---
 fs/btrfs/extent_io.c | 15 +++++++-------
 fs/btrfs/subpage.c   | 47 --------------------------------------------
 fs/btrfs/subpage.h   |  2 --
 3 files changed, 7 insertions(+), 57 deletions(-)

Comments

David Sterba Oct. 8, 2024, 4:58 p.m. UTC | #1
On Mon, Oct 07, 2024 at 09:51:40AM +1030, Qu Wenruo wrote:
> Inside lock_delalloc_folios(), there are several problems related to
> sector size < page size handling:
> 
> - Set the writer locks without checking if the folio is still valid
>   We call btrfs_folio_start_writer_lock() just like it's folio_lock().
>   But since the folio may not even be the folio of the current mapping,
>   we can easily screw up the folio->private.
> 
> - The range is not clampped inside the page
>   This means we can over write other bitmaps if the start/len is not
>   properly handled, and trigger the btrfs_subpage_assert().
> 
> - @processed_end is always rounded up to page end
>   If the delalloc range is not page aligned, and we need to retry
>   (returning -EAGAIN), then we will unlock to the page end.
> 
>   Thankfully this is not a huge problem, as now
>   btrfs_folio_end_writer_lock() can handle range larger than the locked
>   range, and only unlock what is already locked.
> 
> Fix all these problems by:
> 
> - Lock and check the folio first, then call
>   btrfs_folio_set_writer_lock()
>   So that if we got a folio not belonging to the inode, we won't
>   touch folio->private.
> 
> - Properly truncate the range inside the page
> 
> - Update @processed_end to the locked range end
> 
> - Remove btrfs_folio_start_writer_lock()
>   Since all callsites only utilize btrfs_folio_set_writer_lock() now,
>   remove the unnecessary btrfs_folio_start_writer_lock().
> 
> Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking")

This commit is in 5.14, so we'd need backport to 5.15+. The v1 would be
more suitable as it does not remove the unused
btrfs_folio_start_writer_lock(). As it's exported function there should
be no warning.

> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Remove the unused btrfs_folio_start_writer_lock() function
> ---
>  fs/btrfs/extent_io.c | 15 +++++++-------
>  fs/btrfs/subpage.c   | 47 --------------------------------------------
>  fs/btrfs/subpage.h   |  2 --
>  3 files changed, 7 insertions(+), 57 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index e0b43640849e..0448cee2b983 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -262,22 +262,21 @@ static noinline int lock_delalloc_folios(struct inode *inode,
>  
>  		for (i = 0; i < found_folios; i++) {
>  			struct folio *folio = fbatch.folios[i];
> -			u32 len = end + 1 - start;
> +			u64 range_start = max_t(u64, folio_pos(folio), start);
> +			u32 range_len = min_t(u64, folio_pos(folio) + folio_size(folio),
> +					      end + 1) - range_start;
>  
>  			if (folio == locked_folio)
>  				continue;
>  
> -			if (btrfs_folio_start_writer_lock(fs_info, folio, start,
> -							  len))
> -				goto out;
> -
> +			folio_lock(folio);
>  			if (!folio_test_dirty(folio) || folio->mapping != mapping) {

Should the range_start and range_end be set after folio_lock? One of the
prolblems you're fixing is the folio and mapping mismatch, if I read it
correctly.
Qu Wenruo Oct. 8, 2024, 8:58 p.m. UTC | #2
在 2024/10/9 03:28, David Sterba 写道:
> On Mon, Oct 07, 2024 at 09:51:40AM +1030, Qu Wenruo wrote:
>> Inside lock_delalloc_folios(), there are several problems related to
>> sector size < page size handling:
>>
>> - Set the writer locks without checking if the folio is still valid
>>    We call btrfs_folio_start_writer_lock() just like it's folio_lock().
>>    But since the folio may not even be the folio of the current mapping,
>>    we can easily screw up the folio->private.
>>
>> - The range is not clampped inside the page
>>    This means we can over write other bitmaps if the start/len is not
>>    properly handled, and trigger the btrfs_subpage_assert().
>>
>> - @processed_end is always rounded up to page end
>>    If the delalloc range is not page aligned, and we need to retry
>>    (returning -EAGAIN), then we will unlock to the page end.
>>
>>    Thankfully this is not a huge problem, as now
>>    btrfs_folio_end_writer_lock() can handle range larger than the locked
>>    range, and only unlock what is already locked.
>>
>> Fix all these problems by:
>>
>> - Lock and check the folio first, then call
>>    btrfs_folio_set_writer_lock()
>>    So that if we got a folio not belonging to the inode, we won't
>>    touch folio->private.
>>
>> - Properly truncate the range inside the page
>>
>> - Update @processed_end to the locked range end
>>
>> - Remove btrfs_folio_start_writer_lock()
>>    Since all callsites only utilize btrfs_folio_set_writer_lock() now,
>>    remove the unnecessary btrfs_folio_start_writer_lock().
>>
>> Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking")
>
> This commit is in 5.14, so we'd need backport to 5.15+. The v1 would be
> more suitable as it does not remove the unused
> btrfs_folio_start_writer_lock(). As it's exported function there should
> be no warning.
>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Remove the unused btrfs_folio_start_writer_lock() function
>> ---
>>   fs/btrfs/extent_io.c | 15 +++++++-------
>>   fs/btrfs/subpage.c   | 47 --------------------------------------------
>>   fs/btrfs/subpage.h   |  2 --
>>   3 files changed, 7 insertions(+), 57 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index e0b43640849e..0448cee2b983 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -262,22 +262,21 @@ static noinline int lock_delalloc_folios(struct inode *inode,
>>
>>   		for (i = 0; i < found_folios; i++) {
>>   			struct folio *folio = fbatch.folios[i];
>> -			u32 len = end + 1 - start;
>> +			u64 range_start = max_t(u64, folio_pos(folio), start);
>> +			u32 range_len = min_t(u64, folio_pos(folio) + folio_size(folio),
>> +					      end + 1) - range_start;
>>
>>   			if (folio == locked_folio)
>>   				continue;
>>
>> -			if (btrfs_folio_start_writer_lock(fs_info, folio, start,
>> -							  len))
>> -				goto out;
>> -
>> +			folio_lock(folio);
>>   			if (!folio_test_dirty(folio) || folio->mapping != mapping) {
>
> Should the range_start and range_end be set after folio_lock? One of the
> prolblems you're fixing is the folio and mapping mismatch, if I read it
> correctly.
>
Nice catch, indeed that's the problem.

Will fix it and split out the cleanup in the next version.

Thanks,
Qu
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e0b43640849e..0448cee2b983 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -262,22 +262,21 @@  static noinline int lock_delalloc_folios(struct inode *inode,
 
 		for (i = 0; i < found_folios; i++) {
 			struct folio *folio = fbatch.folios[i];
-			u32 len = end + 1 - start;
+			u64 range_start = max_t(u64, folio_pos(folio), start);
+			u32 range_len = min_t(u64, folio_pos(folio) + folio_size(folio),
+					      end + 1) - range_start;
 
 			if (folio == locked_folio)
 				continue;
 
-			if (btrfs_folio_start_writer_lock(fs_info, folio, start,
-							  len))
-				goto out;
-
+			folio_lock(folio);
 			if (!folio_test_dirty(folio) || folio->mapping != mapping) {
-				btrfs_folio_end_writer_lock(fs_info, folio, start,
-							    len);
+				folio_unlock(folio);
 				goto out;
 			}
+			btrfs_folio_set_writer_lock(fs_info, folio, range_start, range_len);
 
-			processed_end = folio_pos(folio) + folio_size(folio) - 1;
+			processed_end = range_start + range_len - 1;
 		}
 		folio_batch_release(&fbatch);
 		cond_resched();
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index f98ce74b450e..3c1e34ddda74 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -333,26 +333,6 @@  static void btrfs_subpage_clamp_range(struct folio *folio, u64 *start, u32 *len)
 			     orig_start + orig_len) - *start;
 }
 
-static void btrfs_subpage_start_writer(const struct btrfs_fs_info *fs_info,
-				       struct folio *folio, u64 start, u32 len)
-{
-	struct btrfs_subpage *subpage = folio_get_private(folio);
-	const int start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len);
-	const int nbits = (len >> fs_info->sectorsize_bits);
-	unsigned long flags;
-	int ret;
-
-	btrfs_subpage_assert(fs_info, folio, start, len);
-
-	spin_lock_irqsave(&subpage->lock, flags);
-	ASSERT(atomic_read(&subpage->readers) == 0);
-	assert_bitmap_range_all_zero(fs_info, folio, start, len);
-	bitmap_set(subpage->bitmaps, start_bit, nbits);
-	ret = atomic_add_return(nbits, &subpage->writers);
-	ASSERT(ret == nbits);
-	spin_unlock_irqrestore(&subpage->lock, flags);
-}
-
 static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_info,
 					      struct folio *folio, u64 start, u32 len)
 {
@@ -389,33 +369,6 @@  static bool btrfs_subpage_end_and_test_writer(const struct btrfs_fs_info *fs_inf
 	return last;
 }
 
-/*
- * Lock a folio for delalloc page writeback.
- *
- * Return -EAGAIN if the page is not properly initialized.
- * Return 0 with the page locked, and writer counter updated.
- *
- * Even with 0 returned, the page still need extra check to make sure
- * it's really the correct page, as the caller is using
- * filemap_get_folios_contig(), which can race with page invalidating.
- */
-int btrfs_folio_start_writer_lock(const struct btrfs_fs_info *fs_info,
-				  struct folio *folio, u64 start, u32 len)
-{
-	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio->mapping)) {
-		folio_lock(folio);
-		return 0;
-	}
-	folio_lock(folio);
-	if (!folio_test_private(folio) || !folio_get_private(folio)) {
-		folio_unlock(folio);
-		return -EAGAIN;
-	}
-	btrfs_subpage_clamp_range(folio, &start, &len);
-	btrfs_subpage_start_writer(fs_info, folio, start, len);
-	return 0;
-}
-
 /*
  * Handle different locked folios:
  *
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 48a481b327ca..cbee866152de 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -100,8 +100,6 @@  void btrfs_subpage_start_reader(const struct btrfs_fs_info *fs_info,
 void btrfs_subpage_end_reader(const struct btrfs_fs_info *fs_info,
 			      struct folio *folio, u64 start, u32 len);
 
-int btrfs_folio_start_writer_lock(const struct btrfs_fs_info *fs_info,
-				  struct folio *folio, u64 start, u32 len);
 void btrfs_folio_end_writer_lock(const struct btrfs_fs_info *fs_info,
 				 struct folio *folio, u64 start, u32 len);
 void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info,