diff mbox series

[v3,2/4] iomap: lift zeroed mapping handling into iomap_zero_range()

Message ID 20241108124246.198489-3-bfoster@redhat.com (mailing list archive)
State New
Headers show
Series iomap: zero range flush fixes | expand

Commit Message

Brian Foster Nov. 8, 2024, 12:42 p.m. UTC
In preparation for special handling of subranges, lift the zeroed
mapping logic from the iterator into the caller. Since this puts the
pagecache dirty check and flushing in the same place, streamline the
comments a bit as well.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/iomap/buffered-io.c | 64 +++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 42 deletions(-)

Comments

Darrick J. Wong Nov. 9, 2024, 3:01 a.m. UTC | #1
On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> In preparation for special handling of subranges, lift the zeroed
> mapping logic from the iterator into the caller. Since this puts the
> pagecache dirty check and flushing in the same place, streamline the
> comments a bit as well.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/iomap/buffered-io.c | 64 +++++++++++++++---------------------------
>  1 file changed, 22 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ef0b68bccbb6..a78b5b9b3df3 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1350,40 +1350,12 @@ static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
>  	return filemap_write_and_wait_range(mapping, i->pos, end);
>  }
>  
> -static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
> -		bool *range_dirty)
> +static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
>  {
> -	const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  	loff_t pos = iter->pos;
>  	loff_t length = iomap_length(iter);
>  	loff_t written = 0;
>  
> -	/*
> -	 * We must zero subranges of unwritten mappings that might be dirty in
> -	 * pagecache from previous writes. We only know whether the entire range
> -	 * was clean or not, however, and dirty folios may have been written
> -	 * back or reclaimed at any point after mapping lookup.
> -	 *
> -	 * The easiest way to deal with this is to flush pagecache to trigger
> -	 * any pending unwritten conversions and then grab the updated extents
> -	 * from the fs. The flush may change the current mapping, so mark it
> -	 * stale for the iterator to remap it for the next pass to handle
> -	 * properly.
> -	 *
> -	 * Note that holes are treated the same as unwritten because zero range
> -	 * is (ab)used for partial folio zeroing in some cases. Hole backed
> -	 * post-eof ranges can be dirtied via mapped write and the flush
> -	 * triggers writeback time post-eof zeroing.
> -	 */
> -	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
> -		if (*range_dirty) {
> -			*range_dirty = false;
> -			return iomap_zero_iter_flush_and_stale(iter);
> -		}
> -		/* range is clean and already zeroed, nothing to do */
> -		return length;
> -	}
> -
>  	do {
>  		struct folio *folio;
>  		int status;
> @@ -1433,24 +1405,32 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>  	bool range_dirty;
>  
>  	/*
> -	 * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
> -	 * pagecache must be flushed to ensure stale data from previous
> -	 * buffered writes is not exposed. A flush is only required for certain
> -	 * types of mappings, but checking pagecache after mapping lookup is
> -	 * racy with writeback and reclaim.
> +	 * Zero range can skip mappings that are zero on disk so long as
> +	 * pagecache is clean. If pagecache was dirty prior to zero range, the
> +	 * mapping converts on writeback completion and so must be zeroed.
>  	 *
> -	 * Therefore, check the entire range first and pass along whether any
> -	 * part of it is dirty. If so and an underlying mapping warrants it,
> -	 * flush the cache at that point. This trades off the occasional false
> -	 * positive (and spurious flush, if the dirty data and mapping don't
> -	 * happen to overlap) for simplicity in handling a relatively uncommon
> -	 * situation.
> +	 * The simplest way to deal with this across a range is to flush
> +	 * pagecache and process the updated mappings. To avoid an unconditional
> +	 * flush, check pagecache state and only flush if dirty and the fs
> +	 * returns a mapping that might convert on writeback.
>  	 */
>  	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
>  					pos, pos + len - 1);
> +	while ((ret = iomap_iter(&iter, ops)) > 0) {
> +		const struct iomap *s = iomap_iter_srcmap(&iter);
> +
> +		if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> +			loff_t p = iomap_length(&iter);

Another dumb nit: blank line after the declaration.

With that fixed, this is ok by me for further testing:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +			if (range_dirty) {
> +				range_dirty = false;
> +				p = iomap_zero_iter_flush_and_stale(&iter);
> +			}
> +			iter.processed = p;
> +			continue;
> +		}
>  
> -	while ((ret = iomap_iter(&iter, ops)) > 0)
> -		iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty);
> +		iter.processed = iomap_zero_iter(&iter, did_zero);
> +	}
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_zero_range);
> -- 
> 2.47.0
> 
>
Christoph Hellwig Nov. 11, 2024, 6:03 a.m. UTC | #2
On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> In preparation for special handling of subranges, lift the zeroed
> mapping logic from the iterator into the caller.

What's that special code?  I don't really see anything added to this
in the new code?  In general I would prefer if all code for the
iteration would be kept in a single function in preparation for
unrolling these loops.  If you want to keep this code separate
from the write zeroes logic (which seems like a good idea) please
just just move the actual real zeroing out of iomap_zero_iter into
a separate helper similar to how we e.g. have multiple different
implementations in the dio iterator.

> +	while ((ret = iomap_iter(&iter, ops)) > 0) {
> +		const struct iomap *s = iomap_iter_srcmap(&iter);
> +
> +		if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> +			loff_t p = iomap_length(&iter);

Also please stick to variable names that are readable and preferably
the same as in the surrounding code, e.g. s -> srcmap p -> pos.
Brian Foster Nov. 12, 2024, 1:59 p.m. UTC | #3
On Fri, Nov 08, 2024 at 07:01:27PM -0800, Darrick J. Wong wrote:
> On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> > In preparation for special handling of subranges, lift the zeroed
> > mapping logic from the iterator into the caller. Since this puts the
> > pagecache dirty check and flushing in the same place, streamline the
> > comments a bit as well.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/iomap/buffered-io.c | 64 +++++++++++++++---------------------------
> >  1 file changed, 22 insertions(+), 42 deletions(-)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index ef0b68bccbb6..a78b5b9b3df3 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -1350,40 +1350,12 @@ static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
> >  	return filemap_write_and_wait_range(mapping, i->pos, end);
> >  }
> >  
> > -static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
> > -		bool *range_dirty)
> > +static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
> >  {
> > -	const struct iomap *srcmap = iomap_iter_srcmap(iter);
> >  	loff_t pos = iter->pos;
> >  	loff_t length = iomap_length(iter);
> >  	loff_t written = 0;
> >  
> > -	/*
> > -	 * We must zero subranges of unwritten mappings that might be dirty in
> > -	 * pagecache from previous writes. We only know whether the entire range
> > -	 * was clean or not, however, and dirty folios may have been written
> > -	 * back or reclaimed at any point after mapping lookup.
> > -	 *
> > -	 * The easiest way to deal with this is to flush pagecache to trigger
> > -	 * any pending unwritten conversions and then grab the updated extents
> > -	 * from the fs. The flush may change the current mapping, so mark it
> > -	 * stale for the iterator to remap it for the next pass to handle
> > -	 * properly.
> > -	 *
> > -	 * Note that holes are treated the same as unwritten because zero range
> > -	 * is (ab)used for partial folio zeroing in some cases. Hole backed
> > -	 * post-eof ranges can be dirtied via mapped write and the flush
> > -	 * triggers writeback time post-eof zeroing.
> > -	 */
> > -	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
> > -		if (*range_dirty) {
> > -			*range_dirty = false;
> > -			return iomap_zero_iter_flush_and_stale(iter);
> > -		}
> > -		/* range is clean and already zeroed, nothing to do */
> > -		return length;
> > -	}
> > -
> >  	do {
> >  		struct folio *folio;
> >  		int status;
> > @@ -1433,24 +1405,32 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
> >  	bool range_dirty;
> >  
> >  	/*
> > -	 * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
> > -	 * pagecache must be flushed to ensure stale data from previous
> > -	 * buffered writes is not exposed. A flush is only required for certain
> > -	 * types of mappings, but checking pagecache after mapping lookup is
> > -	 * racy with writeback and reclaim.
> > +	 * Zero range can skip mappings that are zero on disk so long as
> > +	 * pagecache is clean. If pagecache was dirty prior to zero range, the
> > +	 * mapping converts on writeback completion and so must be zeroed.
> >  	 *
> > -	 * Therefore, check the entire range first and pass along whether any
> > -	 * part of it is dirty. If so and an underlying mapping warrants it,
> > -	 * flush the cache at that point. This trades off the occasional false
> > -	 * positive (and spurious flush, if the dirty data and mapping don't
> > -	 * happen to overlap) for simplicity in handling a relatively uncommon
> > -	 * situation.
> > +	 * The simplest way to deal with this across a range is to flush
> > +	 * pagecache and process the updated mappings. To avoid an unconditional
> > +	 * flush, check pagecache state and only flush if dirty and the fs
> > +	 * returns a mapping that might convert on writeback.
> >  	 */
> >  	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
> >  					pos, pos + len - 1);
> > +	while ((ret = iomap_iter(&iter, ops)) > 0) {
> > +		const struct iomap *s = iomap_iter_srcmap(&iter);
> > +
> > +		if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> > +			loff_t p = iomap_length(&iter);
> 
> Another dumb nit: blank line after the declaration.
> 

Fixed.

> With that fixed, this is ok by me for further testing:
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 

Thanks.

Brian

> --D
> 
> > +			if (range_dirty) {
> > +				range_dirty = false;
> > +				p = iomap_zero_iter_flush_and_stale(&iter);
> > +			}
> > +			iter.processed = p;
> > +			continue;
> > +		}
> >  
> > -	while ((ret = iomap_iter(&iter, ops)) > 0)
> > -		iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty);
> > +		iter.processed = iomap_zero_iter(&iter, did_zero);
> > +	}
> >  	return ret;
> >  }
> >  EXPORT_SYMBOL_GPL(iomap_zero_range);
> > -- 
> > 2.47.0
> > 
> > 
>
Brian Foster Nov. 12, 2024, 2 p.m. UTC | #4
On Sun, Nov 10, 2024 at 10:03:44PM -0800, Christoph Hellwig wrote:
> On Fri, Nov 08, 2024 at 07:42:44AM -0500, Brian Foster wrote:
> > In preparation for special handling of subranges, lift the zeroed
> > mapping logic from the iterator into the caller.
> 
> What's that special code?  I don't really see anything added to this
> in the new code?  In general I would prefer if all code for the
> iteration would be kept in a single function in preparation for
> unrolling these loops.  If you want to keep this code separate
> from the write zeroes logic (which seems like a good idea) please
> just just move the actual real zeroing out of iomap_zero_iter into
> a separate helper similar to how we e.g. have multiple different
> implementations in the dio iterator.
> 

There is no special code... the special treatment is to check the dirty
state of a block unaligned start in isolation to decide whether to skip
or explicitly zero if dirty. The fallback logic is to check the dirty
state of the entire range and if needed, flush the mapping to push all
pending (dirty && unwritten) instances out to the fs so the iomap is up
to date and we can safely skip iomaps that are inherently zero on disk.

Hmm.. so I see the multiple iter modes for dio, but it looks like that
is inherent to the mapping type. That's not quite what I'm doing here,
so I'm not totally clear on what you're asking for. FWIW, I swizzled
this code around a few times and failed to ultimately find something I'd
consider elegant. For example, initial versions would have something
like another param to iomap_zero_iter() to skip the optimization logic
(i.e. don't skip zeroed extents for this call), which I think is more in
the spirit of what you're saying, but I ultimately found it cleaner to
open code that part. If you had something else in mind, could you share
some pseudocode or something to show the factoring..?

> > +	while ((ret = iomap_iter(&iter, ops)) > 0) {
> > +		const struct iomap *s = iomap_iter_srcmap(&iter);
> > +
> > +		if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
> > +			loff_t p = iomap_length(&iter);
> 
> Also please stick to variable names that are readable and preferably
> the same as in the surrounding code, e.g. s -> srcmap p -> pos.
> 

Sure. I think I did this to avoid long lines, but I can change it.
Thanks.

Brian
diff mbox series

Patch

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ef0b68bccbb6..a78b5b9b3df3 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1350,40 +1350,12 @@  static inline int iomap_zero_iter_flush_and_stale(struct iomap_iter *i)
 	return filemap_write_and_wait_range(mapping, i->pos, end);
 }
 
-static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero,
-		bool *range_dirty)
+static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
 {
-	const struct iomap *srcmap = iomap_iter_srcmap(iter);
 	loff_t pos = iter->pos;
 	loff_t length = iomap_length(iter);
 	loff_t written = 0;
 
-	/*
-	 * We must zero subranges of unwritten mappings that might be dirty in
-	 * pagecache from previous writes. We only know whether the entire range
-	 * was clean or not, however, and dirty folios may have been written
-	 * back or reclaimed at any point after mapping lookup.
-	 *
-	 * The easiest way to deal with this is to flush pagecache to trigger
-	 * any pending unwritten conversions and then grab the updated extents
-	 * from the fs. The flush may change the current mapping, so mark it
-	 * stale for the iterator to remap it for the next pass to handle
-	 * properly.
-	 *
-	 * Note that holes are treated the same as unwritten because zero range
-	 * is (ab)used for partial folio zeroing in some cases. Hole backed
-	 * post-eof ranges can be dirtied via mapped write and the flush
-	 * triggers writeback time post-eof zeroing.
-	 */
-	if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) {
-		if (*range_dirty) {
-			*range_dirty = false;
-			return iomap_zero_iter_flush_and_stale(iter);
-		}
-		/* range is clean and already zeroed, nothing to do */
-		return length;
-	}
-
 	do {
 		struct folio *folio;
 		int status;
@@ -1433,24 +1405,32 @@  iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 	bool range_dirty;
 
 	/*
-	 * Zero range wants to skip pre-zeroed (i.e. unwritten) mappings, but
-	 * pagecache must be flushed to ensure stale data from previous
-	 * buffered writes is not exposed. A flush is only required for certain
-	 * types of mappings, but checking pagecache after mapping lookup is
-	 * racy with writeback and reclaim.
+	 * Zero range can skip mappings that are zero on disk so long as
+	 * pagecache is clean. If pagecache was dirty prior to zero range, the
+	 * mapping converts on writeback completion and so must be zeroed.
 	 *
-	 * Therefore, check the entire range first and pass along whether any
-	 * part of it is dirty. If so and an underlying mapping warrants it,
-	 * flush the cache at that point. This trades off the occasional false
-	 * positive (and spurious flush, if the dirty data and mapping don't
-	 * happen to overlap) for simplicity in handling a relatively uncommon
-	 * situation.
+	 * The simplest way to deal with this across a range is to flush
+	 * pagecache and process the updated mappings. To avoid an unconditional
+	 * flush, check pagecache state and only flush if dirty and the fs
+	 * returns a mapping that might convert on writeback.
 	 */
 	range_dirty = filemap_range_needs_writeback(inode->i_mapping,
 					pos, pos + len - 1);
+	while ((ret = iomap_iter(&iter, ops)) > 0) {
+		const struct iomap *s = iomap_iter_srcmap(&iter);
+
+		if (s->type == IOMAP_HOLE || s->type == IOMAP_UNWRITTEN) {
+			loff_t p = iomap_length(&iter);
+			if (range_dirty) {
+				range_dirty = false;
+				p = iomap_zero_iter_flush_and_stale(&iter);
+			}
+			iter.processed = p;
+			continue;
+		}
 
-	while ((ret = iomap_iter(&iter, ops)) > 0)
-		iter.processed = iomap_zero_iter(&iter, did_zero, &range_dirty);
+		iter.processed = iomap_zero_iter(&iter, did_zero);
+	}
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_zero_range);