diff mbox series

[05/13] iomap: factor out a iomap_writepage_handle_eof helper

Message ID 20231126124720.1249310-6-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/13] iomap: clear the per-folio dirty bits on all writeback failures | expand

Commit Message

Christoph Hellwig Nov. 26, 2023, 12:47 p.m. UTC
Most of iomap_do_writepage is dedidcated to handling a folio crossing or
beyond i_size.  Split this is into a separate helper and update the
commens to deal with folios instead of pages and make them more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/buffered-io.c | 128 ++++++++++++++++++++---------------------
 1 file changed, 62 insertions(+), 66 deletions(-)

Comments

Ritesh Harjani (IBM) Nov. 27, 2023, 6:57 a.m. UTC | #1
Christoph Hellwig <hch@lst.de> writes:

> Most of iomap_do_writepage is dedidcated to handling a folio crossing or
> beyond i_size.  Split this is into a separate helper and update the
> commens to deal with folios instead of pages and make them more readable.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/iomap/buffered-io.c | 128 ++++++++++++++++++++---------------------
>  1 file changed, 62 insertions(+), 66 deletions(-)

Just a small nit below.

But otherwise looks good to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 8148e4c9765dac..4a5a21809b0182 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1768,6 +1768,64 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
>  	wbc_account_cgroup_owner(wbc, &folio->page, len);
>  }
>  
> +/*
> + * Check interaction of the folio with the file end.
> + *
> + * If the folio is entirely beyond i_size, return false.  If it straddles
> + * i_size, adjust end_pos and zero all data beyond i_size.
> + */
> +static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
> +		u64 *end_pos)
> +{
> +	u64 isize = i_size_read(inode);

i_size_read(inode) returns loff_t type. Can we make end_pos also as
loff_t type. We anyway use loff_t for
folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It
would be more consistent with the data type then.

Thoughts?

-ritesh


> +
> +	if (*end_pos > isize) {
> +		size_t poff = offset_in_folio(folio, isize);
> +		pgoff_t end_index = isize >> PAGE_SHIFT;
> +
> +		/*
> +		 * If the folio is entirely ouside of i_size, skip it.
> +		 *
> +		 * This can happen due to a truncate operation that is in
> +		 * progress and in that case truncate will finish it off once
> +		 * we've dropped the folio lock.
> +		 *
> +		 * Note that the pgoff_t used for end_index is an unsigned long.
> +		 * If the given offset is greater than 16TB on a 32-bit system,
> +		 * then if we checked if the folio is fully outside i_size with
> +		 * "if (folio->index >= end_index + 1)", "end_index + 1" would
> +		 * overflow and evaluate to 0.  Hence this folio would be
> +		 * redirtied and written out repeatedly, which would result in
> +		 * an infinite loop; the user program performing this operation
> +		 * would hang.  Instead, we can detect this situation by
> +		 * checking if the folio is totally beyond i_size or if its
> +		 * offset is just equal to the EOF.
> +		 */
> +		if (folio->index > end_index ||
> +		    (folio->index == end_index && poff == 0))
> +			return false;
> +
> +		/*
> +		 * The folio straddles i_size.
> +		 *
> +		 * It must be zeroed out on each and every writepage invocation
> +		 * because it may be mmapped:
> +		 *
> +		 *    A file is mapped in multiples of the page size.  For a
> +		 *    file that is not a multiple of the page size, the
> +		 *    remaining memory is zeroed when mapped, and writes to that
> +		 *    region are not written out to the file.
> +		 *
> +		 * Also adjust the writeback range to skip all blocks entirely
> +		 * beyond i_size.
> +		 */
> +		folio_zero_segment(folio, poff, folio_size(folio));
> +		*end_pos = isize;
> +	}
> +
> +	return true;
> +}
> +
>  /*
>   * We implement an immediate ioend submission policy here to avoid needing to
>   * chain multiple ioends and hence nest mempool allocations which can violate
> @@ -1906,78 +1964,16 @@ static int iomap_do_writepage(struct folio *folio,
>  {
>  	struct iomap_writepage_ctx *wpc = data;
>  	struct inode *inode = folio->mapping->host;
> -	u64 end_pos, isize;
> +	u64 end_pos = folio_pos(folio) + folio_size(folio);
>  
>  	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
>  
> -	/*
> -	 * Is this folio beyond the end of the file?
> -	 *
> -	 * The folio index is less than the end_index, adjust the end_pos
> -	 * to the highest offset that this folio should represent.
> -	 * -----------------------------------------------------
> -	 * |			file mapping	       | <EOF> |
> -	 * -----------------------------------------------------
> -	 * | Page ... | Page N-2 | Page N-1 |  Page N  |       |
> -	 * ^--------------------------------^----------|--------
> -	 * |     desired writeback range    |      see else    |
> -	 * ---------------------------------^------------------|
> -	 */
> -	isize = i_size_read(inode);
> -	end_pos = folio_pos(folio) + folio_size(folio);
> -	if (end_pos > isize) {
> -		/*
> -		 * Check whether the page to write out is beyond or straddles
> -		 * i_size or not.
> -		 * -------------------------------------------------------
> -		 * |		file mapping		        | <EOF>  |
> -		 * -------------------------------------------------------
> -		 * | Page ... | Page N-2 | Page N-1 |  Page N   | Beyond |
> -		 * ^--------------------------------^-----------|---------
> -		 * |				    |      Straddles     |
> -		 * ---------------------------------^-----------|--------|
> -		 */
> -		size_t poff = offset_in_folio(folio, isize);
> -		pgoff_t end_index = isize >> PAGE_SHIFT;
> -
> -		/*
> -		 * Skip the page if it's fully outside i_size, e.g.
> -		 * due to a truncate operation that's in progress.  We've
> -		 * cleaned this page and truncate will finish things off for
> -		 * us.
> -		 *
> -		 * Note that the end_index is unsigned long.  If the given
> -		 * offset is greater than 16TB on a 32-bit system then if we
> -		 * checked if the page is fully outside i_size with
> -		 * "if (page->index >= end_index + 1)", "end_index + 1" would
> -		 * overflow and evaluate to 0.  Hence this page would be
> -		 * redirtied and written out repeatedly, which would result in
> -		 * an infinite loop; the user program performing this operation
> -		 * would hang.  Instead, we can detect this situation by
> -		 * checking if the page is totally beyond i_size or if its
> -		 * offset is just equal to the EOF.
> -		 */
> -		if (folio->index > end_index ||
> -		    (folio->index == end_index && poff == 0))
> -			goto unlock;
> -
> -		/*
> -		 * The page straddles i_size.  It must be zeroed out on each
> -		 * and every writepage invocation because it may be mmapped.
> -		 * "A file is mapped in multiples of the page size.  For a file
> -		 * that is not a multiple of the page size, the remaining
> -		 * memory is zeroed when mapped, and writes to that region are
> -		 * not written out to the file."
> -		 */
> -		folio_zero_segment(folio, poff, folio_size(folio));
> -		end_pos = isize;
> +	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
> +		folio_unlock(folio);
> +		return 0;
>  	}
>  
>  	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
> -
> -unlock:
> -	folio_unlock(folio);
> -	return 0;
>  }
>  
>  int
> -- 
> 2.39.2
Ritesh Harjani (IBM) Nov. 27, 2023, 7:02 a.m. UTC | #2
Ritesh Harjani (IBM) <ritesh.list@gmail.com> writes:

> Christoph Hellwig <hch@lst.de> writes:
>
>> Most of iomap_do_writepage is dedidcated to handling a folio crossing or
>> beyond i_size.  Split this is into a separate helper and update the
>> commens to deal with folios instead of pages and make them more readable.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  fs/iomap/buffered-io.c | 128 ++++++++++++++++++++---------------------
>>  1 file changed, 62 insertions(+), 66 deletions(-)
>
> Just a small nit below.
>
> But otherwise looks good to me. Please feel free to add - 
>
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>
>>
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 8148e4c9765dac..4a5a21809b0182 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -1768,6 +1768,64 @@ iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
>>  	wbc_account_cgroup_owner(wbc, &folio->page, len);
>>  }
>>  
>> +/*
>> + * Check interaction of the folio with the file end.
>> + *
>> + * If the folio is entirely beyond i_size, return false.  If it straddles
>> + * i_size, adjust end_pos and zero all data beyond i_size.
>> + */
>> +static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
>> +		u64 *end_pos)
>> +{
>> +	u64 isize = i_size_read(inode);
>
> i_size_read(inode) returns loff_t type. Can we make end_pos also as
> loff_t type. We anyway use loff_t for
> folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It
> would be more consistent with the data type then.
>
> Thoughts?

aah, that might also require to change the types in
iomap_writepage_map(). So I guess the data type consistency change
should be a follow up change as this patch does only the refactoring.

>
> -ritesh
>
>
>> +
>> +	if (*end_pos > isize) {
>> +		size_t poff = offset_in_folio(folio, isize);
>> +		pgoff_t end_index = isize >> PAGE_SHIFT;
>> +
>> +		/*
>> +		 * If the folio is entirely ouside of i_size, skip it.
>> +		 *
>> +		 * This can happen due to a truncate operation that is in
>> +		 * progress and in that case truncate will finish it off once
>> +		 * we've dropped the folio lock.
>> +		 *
>> +		 * Note that the pgoff_t used for end_index is an unsigned long.
>> +		 * If the given offset is greater than 16TB on a 32-bit system,
>> +		 * then if we checked if the folio is fully outside i_size with
>> +		 * "if (folio->index >= end_index + 1)", "end_index + 1" would
>> +		 * overflow and evaluate to 0.  Hence this folio would be
>> +		 * redirtied and written out repeatedly, which would result in
>> +		 * an infinite loop; the user program performing this operation
>> +		 * would hang.  Instead, we can detect this situation by
>> +		 * checking if the folio is totally beyond i_size or if its
>> +		 * offset is just equal to the EOF.
>> +		 */
>> +		if (folio->index > end_index ||
>> +		    (folio->index == end_index && poff == 0))
>> +			return false;
>> +
>> +		/*
>> +		 * The folio straddles i_size.
>> +		 *
>> +		 * It must be zeroed out on each and every writepage invocation
>> +		 * because it may be mmapped:
>> +		 *
>> +		 *    A file is mapped in multiples of the page size.  For a
>> +		 *    file that is not a multiple of the page size, the
>> +		 *    remaining memory is zeroed when mapped, and writes to that
>> +		 *    region are not written out to the file.
>> +		 *
>> +		 * Also adjust the writeback range to skip all blocks entirely
>> +		 * beyond i_size.
>> +		 */
>> +		folio_zero_segment(folio, poff, folio_size(folio));
>> +		*end_pos = isize;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>>  /*
>>   * We implement an immediate ioend submission policy here to avoid needing to
>>   * chain multiple ioends and hence nest mempool allocations which can violate
>> @@ -1906,78 +1964,16 @@ static int iomap_do_writepage(struct folio *folio,
>>  {
>>  	struct iomap_writepage_ctx *wpc = data;
>>  	struct inode *inode = folio->mapping->host;
>> -	u64 end_pos, isize;
>> +	u64 end_pos = folio_pos(folio) + folio_size(folio);
>>  
>>  	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
>>  
>> -	/*
>> -	 * Is this folio beyond the end of the file?
>> -	 *
>> -	 * The folio index is less than the end_index, adjust the end_pos
>> -	 * to the highest offset that this folio should represent.
>> -	 * -----------------------------------------------------
>> -	 * |			file mapping	       | <EOF> |
>> -	 * -----------------------------------------------------
>> -	 * | Page ... | Page N-2 | Page N-1 |  Page N  |       |
>> -	 * ^--------------------------------^----------|--------
>> -	 * |     desired writeback range    |      see else    |
>> -	 * ---------------------------------^------------------|
>> -	 */
>> -	isize = i_size_read(inode);
>> -	end_pos = folio_pos(folio) + folio_size(folio);
>> -	if (end_pos > isize) {
>> -		/*
>> -		 * Check whether the page to write out is beyond or straddles
>> -		 * i_size or not.
>> -		 * -------------------------------------------------------
>> -		 * |		file mapping		        | <EOF>  |
>> -		 * -------------------------------------------------------
>> -		 * | Page ... | Page N-2 | Page N-1 |  Page N   | Beyond |
>> -		 * ^--------------------------------^-----------|---------
>> -		 * |				    |      Straddles     |
>> -		 * ---------------------------------^-----------|--------|
>> -		 */
>> -		size_t poff = offset_in_folio(folio, isize);
>> -		pgoff_t end_index = isize >> PAGE_SHIFT;
>> -
>> -		/*
>> -		 * Skip the page if it's fully outside i_size, e.g.
>> -		 * due to a truncate operation that's in progress.  We've
>> -		 * cleaned this page and truncate will finish things off for
>> -		 * us.
>> -		 *
>> -		 * Note that the end_index is unsigned long.  If the given
>> -		 * offset is greater than 16TB on a 32-bit system then if we
>> -		 * checked if the page is fully outside i_size with
>> -		 * "if (page->index >= end_index + 1)", "end_index + 1" would
>> -		 * overflow and evaluate to 0.  Hence this page would be
>> -		 * redirtied and written out repeatedly, which would result in
>> -		 * an infinite loop; the user program performing this operation
>> -		 * would hang.  Instead, we can detect this situation by
>> -		 * checking if the page is totally beyond i_size or if its
>> -		 * offset is just equal to the EOF.
>> -		 */
>> -		if (folio->index > end_index ||
>> -		    (folio->index == end_index && poff == 0))
>> -			goto unlock;
>> -
>> -		/*
>> -		 * The page straddles i_size.  It must be zeroed out on each
>> -		 * and every writepage invocation because it may be mmapped.
>> -		 * "A file is mapped in multiples of the page size.  For a file
>> -		 * that is not a multiple of the page size, the remaining
>> -		 * memory is zeroed when mapped, and writes to that region are
>> -		 * not written out to the file."
>> -		 */
>> -		folio_zero_segment(folio, poff, folio_size(folio));
>> -		end_pos = isize;
>> +	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
>> +		folio_unlock(folio);
>> +		return 0;
>>  	}
>>  
>>  	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
>> -
>> -unlock:
>> -	folio_unlock(folio);
>> -	return 0;
>>  }
>>  
>>  int
>> -- 
>> 2.39.2
Christoph Hellwig Nov. 27, 2023, 7:12 a.m. UTC | #3
On Mon, Nov 27, 2023 at 12:32:38PM +0530, Ritesh Harjani wrote:
> >
> > i_size_read(inode) returns loff_t type. Can we make end_pos also as
> > loff_t type. We anyway use loff_t for
> > folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It
> > would be more consistent with the data type then.
> >
> > Thoughts?
> 
> aah, that might also require to change the types in
> iomap_writepage_map(). So I guess the data type consistency change
> should be a follow up change as this patch does only the refactoring.

Yes, I'm trying to stay consistent in the writeback code.  IIRC some
of the u64 use was to better deal with overflows, but I'll have to look
up the history.
Darrick J. Wong Nov. 29, 2023, 4:48 a.m. UTC | #4
On Mon, Nov 27, 2023 at 08:12:19AM +0100, Christoph Hellwig wrote:
> On Mon, Nov 27, 2023 at 12:32:38PM +0530, Ritesh Harjani wrote:
> > >
> > > i_size_read(inode) returns loff_t type. Can we make end_pos also as
> > > loff_t type. We anyway use loff_t for
> > > folio_pos(folio) + folio_size(folio), at many places in fs/iomap. It
> > > would be more consistent with the data type then.
> > >
> > > Thoughts?
> > 
> > aah, that might also require to change the types in
> > iomap_writepage_map(). So I guess the data type consistency change
> > should be a follow up change as this patch does only the refactoring.

Separate patch for the cleanup, please.  Then we can more easily target
it for bisection in case there are signed comparison bugs.  I hate C.

> Yes, I'm trying to stay consistent in the writeback code.  IIRC some
> of the u64 use was to better deal with overflows, but I'll have to look
> up the history.

For this patch,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
>
diff mbox series

Patch

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8148e4c9765dac..4a5a21809b0182 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1768,6 +1768,64 @@  iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
 	wbc_account_cgroup_owner(wbc, &folio->page, len);
 }
 
+/*
+ * Check interaction of the folio with the file end.
+ *
+ * If the folio is entirely beyond i_size, return false.  If it straddles
+ * i_size, adjust end_pos and zero all data beyond i_size.
+ */
+static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
+		u64 *end_pos)
+{
+	u64 isize = i_size_read(inode);
+
+	if (*end_pos > isize) {
+		size_t poff = offset_in_folio(folio, isize);
+		pgoff_t end_index = isize >> PAGE_SHIFT;
+
+		/*
+		 * If the folio is entirely ouside of i_size, skip it.
+		 *
+		 * This can happen due to a truncate operation that is in
+		 * progress and in that case truncate will finish it off once
+		 * we've dropped the folio lock.
+		 *
+		 * Note that the pgoff_t used for end_index is an unsigned long.
+		 * If the given offset is greater than 16TB on a 32-bit system,
+		 * then if we checked if the folio is fully outside i_size with
+		 * "if (folio->index >= end_index + 1)", "end_index + 1" would
+		 * overflow and evaluate to 0.  Hence this folio would be
+		 * redirtied and written out repeatedly, which would result in
+		 * an infinite loop; the user program performing this operation
+		 * would hang.  Instead, we can detect this situation by
+		 * checking if the folio is totally beyond i_size or if its
+		 * offset is just equal to the EOF.
+		 */
+		if (folio->index > end_index ||
+		    (folio->index == end_index && poff == 0))
+			return false;
+
+		/*
+		 * The folio straddles i_size.
+		 *
+		 * It must be zeroed out on each and every writepage invocation
+		 * because it may be mmapped:
+		 *
+		 *    A file is mapped in multiples of the page size.  For a
+		 *    file that is not a multiple of the page size, the
+		 *    remaining memory is zeroed when mapped, and writes to that
+		 *    region are not written out to the file.
+		 *
+		 * Also adjust the writeback range to skip all blocks entirely
+		 * beyond i_size.
+		 */
+		folio_zero_segment(folio, poff, folio_size(folio));
+		*end_pos = isize;
+	}
+
+	return true;
+}
+
 /*
  * We implement an immediate ioend submission policy here to avoid needing to
  * chain multiple ioends and hence nest mempool allocations which can violate
@@ -1906,78 +1964,16 @@  static int iomap_do_writepage(struct folio *folio,
 {
 	struct iomap_writepage_ctx *wpc = data;
 	struct inode *inode = folio->mapping->host;
-	u64 end_pos, isize;
+	u64 end_pos = folio_pos(folio) + folio_size(folio);
 
 	trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
 
-	/*
-	 * Is this folio beyond the end of the file?
-	 *
-	 * The folio index is less than the end_index, adjust the end_pos
-	 * to the highest offset that this folio should represent.
-	 * -----------------------------------------------------
-	 * |			file mapping	       | <EOF> |
-	 * -----------------------------------------------------
-	 * | Page ... | Page N-2 | Page N-1 |  Page N  |       |
-	 * ^--------------------------------^----------|--------
-	 * |     desired writeback range    |      see else    |
-	 * ---------------------------------^------------------|
-	 */
-	isize = i_size_read(inode);
-	end_pos = folio_pos(folio) + folio_size(folio);
-	if (end_pos > isize) {
-		/*
-		 * Check whether the page to write out is beyond or straddles
-		 * i_size or not.
-		 * -------------------------------------------------------
-		 * |		file mapping		        | <EOF>  |
-		 * -------------------------------------------------------
-		 * | Page ... | Page N-2 | Page N-1 |  Page N   | Beyond |
-		 * ^--------------------------------^-----------|---------
-		 * |				    |      Straddles     |
-		 * ---------------------------------^-----------|--------|
-		 */
-		size_t poff = offset_in_folio(folio, isize);
-		pgoff_t end_index = isize >> PAGE_SHIFT;
-
-		/*
-		 * Skip the page if it's fully outside i_size, e.g.
-		 * due to a truncate operation that's in progress.  We've
-		 * cleaned this page and truncate will finish things off for
-		 * us.
-		 *
-		 * Note that the end_index is unsigned long.  If the given
-		 * offset is greater than 16TB on a 32-bit system then if we
-		 * checked if the page is fully outside i_size with
-		 * "if (page->index >= end_index + 1)", "end_index + 1" would
-		 * overflow and evaluate to 0.  Hence this page would be
-		 * redirtied and written out repeatedly, which would result in
-		 * an infinite loop; the user program performing this operation
-		 * would hang.  Instead, we can detect this situation by
-		 * checking if the page is totally beyond i_size or if its
-		 * offset is just equal to the EOF.
-		 */
-		if (folio->index > end_index ||
-		    (folio->index == end_index && poff == 0))
-			goto unlock;
-
-		/*
-		 * The page straddles i_size.  It must be zeroed out on each
-		 * and every writepage invocation because it may be mmapped.
-		 * "A file is mapped in multiples of the page size.  For a file
-		 * that is not a multiple of the page size, the remaining
-		 * memory is zeroed when mapped, and writes to that region are
-		 * not written out to the file."
-		 */
-		folio_zero_segment(folio, poff, folio_size(folio));
-		end_pos = isize;
+	if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
+		folio_unlock(folio);
+		return 0;
 	}
 
 	return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
-
-unlock:
-	folio_unlock(folio);
-	return 0;
 }
 
 int