Message ID | 20220503213727.3273873-1-agruenba@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iomap: iomap_write_end cleanup | expand |
On Tue, May 03, 2022 at 11:37:27PM +0200, Andreas Gruenbacher wrote: > In iomap_write_end(), only call iomap_write_failed() on the byte range > that has failed. This should improve code readability, but doesn't fix > an actual bug because iomap_write_failed() is called after updating the > file size here and it only affects the memory beyond the end of the > file. I can't find a way to set 'ret' to anything other than 0 or len. I know the code is written to make it look like we can return a short write, but I can't see a way to do it. > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > --- > fs/iomap/buffered-io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 358ee1fb6f0d..8fb9b2797fc5 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -734,7 +734,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > folio_put(folio); > > if (ret < len) > - iomap_write_failed(iter->inode, pos, len); > + iomap_write_failed(iter->inode, pos + ret, len - ret); > return ret; > } > > -- > 2.35.1 >
On Tue, May 3, 2022 at 11:53 PM Matthew Wilcox <willy@infradead.org> wrote: > On Tue, May 03, 2022 at 11:37:27PM +0200, Andreas Gruenbacher wrote: > > In iomap_write_end(), only call iomap_write_failed() on the byte range > > that has failed. This should improve code readability, but doesn't fix > > an actual bug because iomap_write_failed() is called after updating the > > file size here and it only affects the memory beyond the end of the > > file. > > I can't find a way to set 'ret' to anything other than 0 or len. I know > the code is written to make it look like we can return a short write, > but I can't see a way to do it. Good point, but that doesn't make the code any less confusing in my eyes. Thanks, Andreas > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > --- > > fs/iomap/buffered-io.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > > index 358ee1fb6f0d..8fb9b2797fc5 100644 > > --- a/fs/iomap/buffered-io.c > > +++ b/fs/iomap/buffered-io.c > > @@ -734,7 +734,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > > folio_put(folio); > > > > if (ret < len) > > - iomap_write_failed(iter->inode, pos, len); > > + iomap_write_failed(iter->inode, pos + ret, len - ret); > > return ret; > > } > > > > -- > > 2.35.1 > > >
On Wed, May 04, 2022 at 12:15:45AM +0200, Andreas Gruenbacher wrote: > On Tue, May 3, 2022 at 11:53 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, May 03, 2022 at 11:37:27PM +0200, Andreas Gruenbacher wrote: > > > In iomap_write_end(), only call iomap_write_failed() on the byte range > > > that has failed. This should improve code readability, but doesn't fix > > > an actual bug because iomap_write_failed() is called after updating the > > > file size here and it only affects the memory beyond the end of the > > > file. > > > > I can't find a way to set 'ret' to anything other than 0 or len. I know > > the code is written to make it look like we can return a short write, > > but I can't see a way to do it. > > Good point, but that doesn't make the code any less confusing in my eyes. Not to mention it leaves a logic bomb if we ever /do/ start returning 0 < ret < len. Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > Thanks, > Andreas > > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > > --- > > > fs/iomap/buffered-io.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > > > index 358ee1fb6f0d..8fb9b2797fc5 100644 > > > --- a/fs/iomap/buffered-io.c > > > +++ b/fs/iomap/buffered-io.c > > > @@ -734,7 +734,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > > > folio_put(folio); > > > > > > if (ret < len) > > > - iomap_write_failed(iter->inode, pos, len); > > > + iomap_write_failed(iter->inode, pos + ret, len - ret); > > > return ret; > > > } > > > > > > -- > > > 2.35.1 > > > > > >
On Tue, May 03, 2022 at 04:02:26PM -0700, Darrick J. Wong wrote: > On Wed, May 04, 2022 at 12:15:45AM +0200, Andreas Gruenbacher wrote: > > On Tue, May 3, 2022 at 11:53 PM Matthew Wilcox <willy@infradead.org> wrote: > > > On Tue, May 03, 2022 at 11:37:27PM +0200, Andreas Gruenbacher wrote: > > > > In iomap_write_end(), only call iomap_write_failed() on the byte range > > > > that has failed. This should improve code readability, but doesn't fix > > > > an actual bug because iomap_write_failed() is called after updating the > > > > file size here and it only affects the memory beyond the end of the > > > > file. > > > > > > I can't find a way to set 'ret' to anything other than 0 or len. I know > > > the code is written to make it look like we can return a short write, > > > but I can't see a way to do it. > > > > Good point, but that doesn't make the code any less confusing in my eyes. > > Not to mention it leaves a logic bomb if we ever /do/ start returning > 0 < ret < len. This is one of the things I noticed when folioising iomap and didn't get round to cleaning up, but I feel like we should change the calling convention here to bool (true = success, false = fail). Changing block_write_end() might not be on the cards, unless someone's really motivated, but we can at least change iomap_write_end() to not have this stupid calling convention. I mean, I won't NAK this patch, it is somewhat better with it than without it, but it perpetuates the myth that this is in some way ever going to happen, and the code could be a lot simpler if we stopped pretending.
On Wed, May 4, 2022 at 2:27 AM Matthew Wilcox <willy@infradead.org> wrote: > On Tue, May 03, 2022 at 04:02:26PM -0700, Darrick J. Wong wrote: > > On Wed, May 04, 2022 at 12:15:45AM +0200, Andreas Gruenbacher wrote: > > > On Tue, May 3, 2022 at 11:53 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Tue, May 03, 2022 at 11:37:27PM +0200, Andreas Gruenbacher wrote: > > > > > In iomap_write_end(), only call iomap_write_failed() on the byte range > > > > > that has failed. This should improve code readability, but doesn't fix > > > > > an actual bug because iomap_write_failed() is called after updating the > > > > > file size here and it only affects the memory beyond the end of the > > > > > file. > > > > > > > > I can't find a way to set 'ret' to anything other than 0 or len. I know > > > > the code is written to make it look like we can return a short write, > > > > but I can't see a way to do it. > > > > > > Good point, but that doesn't make the code any less confusing in my eyes. > > > > Not to mention it leaves a logic bomb if we ever /do/ start returning > > 0 < ret < len. > > This is one of the things I noticed when folioising iomap and didn't > get round to cleaning up, but I feel like we should change the calling > convention here to bool (true = success, false = fail). Changing > block_write_end() might not be on the cards, unless someone's really > motivated, but we can at least change iomap_write_end() to not have this > stupid calling convention. > > I mean, I won't NAK this patch, it is somewhat better with it than without > it, but it perpetuates the myth that this is in some way ever going to > happen, and the code could be a lot simpler if we stopped pretending. Hmm, I don't really see how this would make things significantly simpler. Trying it out made me notice the following problem, though. Any thoughts? Of course this was copied from generic_write_end(), and so we have the same issue there as well as in nobh_write_end(). Thanks, Andreas diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 8fb9b2797fc5..1938dbbda1c0 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -721,13 +721,13 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, * cache. It's up to the file system to write the updated size to disk, * preferably after I/O completion so that no stale data is exposed. */ - if (pos + ret > old_size) { + if (ret && pos + ret > old_size) { i_size_write(iter->inode, pos + ret); iter->iomap.flags |= IOMAP_F_SIZE_CHANGED; } folio_unlock(folio); - if (old_size < pos) + if (ret && old_size < pos) pagecache_isize_extended(iter->inode, old_size, pos); if (page_ops && page_ops->page_done) page_ops->page_done(iter->inode, pos, ret, &folio->page);
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
On Wed, May 04, 2022 at 01:27:36AM +0100, Matthew Wilcox wrote: > This is one of the things I noticed when folioising iomap and didn't > get round to cleaning up, but I feel like we should change the calling > convention here to bool (true = success, false = fail). Changing > block_write_end() might not be on the cards, unless someone's really > motivated, but we can at least change iomap_write_end() to not have this > stupid calling convention. I kinda hate the bools for something that is not a simple if (foo())) return; as propagating them is a bit of a mess. I do however thing that switching to 0 / -errno might work nicely here, completely untested patch below: diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 8bc0989cf447fa..764174e2f1a183 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -685,13 +685,13 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len, * redo the whole thing. */ if (unlikely(copied < len && !folio_test_uptodate(folio))) - return 0; + return -EIO; iomap_set_range_uptodate(folio, iop, offset_in_folio(folio, pos), len); filemap_dirty_folio(inode->i_mapping, folio); - return copied; + return 0; } -static size_t iomap_write_end_inline(const struct iomap_iter *iter, +static void iomap_write_end_inline(const struct iomap_iter *iter, struct folio *folio, loff_t pos, size_t copied) { const struct iomap *iomap = &iter->iomap; @@ -706,23 +706,22 @@ static size_t iomap_write_end_inline(const struct iomap_iter *iter, kunmap_local(addr); mark_inode_dirty(iter->inode); - return copied; } -/* Returns the number of bytes copied. May be 0. Cannot be an errno. */ -static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, +static int iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, size_t copied, struct folio *folio) { const struct iomap_page_ops *page_ops = iter->iomap.page_ops; const struct iomap *srcmap = iomap_iter_srcmap(iter); loff_t old_size = iter->inode->i_size; - size_t ret; + int ret = 0; if (srcmap->type == IOMAP_INLINE) { - ret = iomap_write_end_inline(iter, folio, pos, copied); + iomap_write_end_inline(iter, folio, pos, copied); } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) { - ret = block_write_end(NULL, iter->inode->i_mapping, pos, len, - copied, &folio->page, NULL); + if (block_write_end(NULL, iter->inode->i_mapping, pos, len, + copied, &folio->page, NULL) < len) + ret = -EIO; } else { ret = __iomap_write_end(iter->inode, pos, len, copied, folio); } @@ -732,8 +731,8 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, * cache. It's up to the file system to write the updated size to disk, * preferably after I/O completion so that no stale data is exposed. */ - if (pos + ret > old_size) { - i_size_write(iter->inode, pos + ret); + if (!ret && pos + len > old_size) { + i_size_write(iter->inode, pos + len); iter->iomap.flags |= IOMAP_F_SIZE_CHANGED; } folio_unlock(folio); @@ -741,10 +740,11 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, if (old_size < pos) pagecache_isize_extended(iter->inode, old_size, pos); if (page_ops && page_ops->page_done) - page_ops->page_done(iter->inode, pos, ret, &folio->page); + page_ops->page_done(iter->inode, pos, ret ? ret : len, + &folio->page); folio_put(folio); - if (ret < len) + if (ret) iomap_write_failed(iter->inode, pos, len); return ret; } @@ -754,7 +754,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) loff_t length = iomap_length(iter); loff_t pos = iter->pos; ssize_t written = 0; - long status = 0; + int status = 0; do { struct folio *folio; @@ -792,26 +792,24 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) copied = copy_page_from_iter_atomic(page, offset, bytes, i); status = iomap_write_end(iter, pos, bytes, copied, folio); - - if (unlikely(copied != status)) - iov_iter_revert(i, copied - status); - - cond_resched(); - if (unlikely(status == 0)) { + if (unlikely(status)) { /* * A short copy made iomap_write_end() reject the * thing entirely. Might be memory poisoning * halfway through, might be a race with munmap, * might be severe memory pressure. */ - if (copied) + if (copied) { + iov_iter_revert(i, copied); bytes = copied; + } goto again; } - pos += status; - written += status; - length -= status; + pos += bytes; + written += bytes; + length -= bytes; + cond_resched(); balance_dirty_pages_ratelimited(iter->inode->i_mapping); } while (iov_iter_count(i) && length); @@ -844,7 +842,6 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter) const struct iomap *srcmap = iomap_iter_srcmap(iter); loff_t pos = iter->pos; loff_t length = iomap_length(iter); - long status = 0; loff_t written = 0; /* don't bother with blocks that are not shared to start with */ @@ -858,20 +855,21 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter) unsigned long offset = offset_in_page(pos); unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length); struct folio *folio; + int status; status = iomap_write_begin(iter, pos, bytes, &folio); if (unlikely(status)) return status; status = iomap_write_end(iter, pos, bytes, bytes, folio); - if (WARN_ON_ONCE(status == 0)) - return -EIO; + if (WARN_ON_ONCE(status)) + return status; cond_resched(); - pos += status; - written += status; - length -= status; + pos += bytes; + written += bytes; + length -= bytes; balance_dirty_pages_ratelimited(iter->inode->i_mapping); } while (length); @@ -925,9 +923,9 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) folio_zero_range(folio, offset, bytes); folio_mark_accessed(folio); - bytes = iomap_write_end(iter, pos, bytes, bytes, folio); - if (WARN_ON_ONCE(bytes == 0)) - return -EIO; + status = iomap_write_end(iter, pos, bytes, bytes, folio); + if (WARN_ON_ONCE(status)) + return status; pos += bytes; length -= bytes;
On Wed, May 4, 2022 at 4:22 PM Christoph Hellwig <hch@infradead.org> wrote: > On Wed, May 04, 2022 at 01:27:36AM +0100, Matthew Wilcox wrote: > > This is one of the things I noticed when folioising iomap and didn't > > get round to cleaning up, but I feel like we should change the calling > > convention here to bool (true = success, false = fail). Changing > > block_write_end() might not be on the cards, unless someone's really > > motivated, but we can at least change iomap_write_end() to not have this > > stupid calling convention. > > I kinda hate the bools for something that is not a simple > > if (foo())) > return; > > as propagating them is a bit of a mess. I do however thing that > switching to 0 / -errno might work nicely here, completely untested > patch below: This doesn't really strike me as better, just different. > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 8bc0989cf447fa..764174e2f1a183 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -685,13 +685,13 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len, > * redo the whole thing. > */ > if (unlikely(copied < len && !folio_test_uptodate(folio))) > - return 0; > + return -EIO; > iomap_set_range_uptodate(folio, iop, offset_in_folio(folio, pos), len); > filemap_dirty_folio(inode->i_mapping, folio); > - return copied; > + return 0; > } > > -static size_t iomap_write_end_inline(const struct iomap_iter *iter, > +static void iomap_write_end_inline(const struct iomap_iter *iter, > struct folio *folio, loff_t pos, size_t copied) > { > const struct iomap *iomap = &iter->iomap; > @@ -706,23 +706,22 @@ static size_t iomap_write_end_inline(const struct iomap_iter *iter, > kunmap_local(addr); > > mark_inode_dirty(iter->inode); > - return copied; > } > > -/* Returns the number of bytes copied. May be 0. Cannot be an errno. */ > -static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > +static int iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > size_t copied, struct folio *folio) > { > const struct iomap_page_ops *page_ops = iter->iomap.page_ops; > const struct iomap *srcmap = iomap_iter_srcmap(iter); > loff_t old_size = iter->inode->i_size; > - size_t ret; > + int ret = 0; > > if (srcmap->type == IOMAP_INLINE) { > - ret = iomap_write_end_inline(iter, folio, pos, copied); > + iomap_write_end_inline(iter, folio, pos, copied); > } else if (srcmap->flags & IOMAP_F_BUFFER_HEAD) { > - ret = block_write_end(NULL, iter->inode->i_mapping, pos, len, > - copied, &folio->page, NULL); > + if (block_write_end(NULL, iter->inode->i_mapping, pos, len, > + copied, &folio->page, NULL) < len) > + ret = -EIO; > } else { > ret = __iomap_write_end(iter->inode, pos, len, copied, folio); > } > @@ -732,8 +731,8 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > * cache. It's up to the file system to write the updated size to disk, > * preferably after I/O completion so that no stale data is exposed. > */ > - if (pos + ret > old_size) { > - i_size_write(iter->inode, pos + ret); > + if (!ret && pos + len > old_size) { > + i_size_write(iter->inode, pos + len); > iter->iomap.flags |= IOMAP_F_SIZE_CHANGED; > } > folio_unlock(folio); > @@ -741,10 +740,11 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > if (old_size < pos) > pagecache_isize_extended(iter->inode, old_size, pos); > if (page_ops && page_ops->page_done) > - page_ops->page_done(iter->inode, pos, ret, &folio->page); > + page_ops->page_done(iter->inode, pos, ret ? ret : len, > + &folio->page); > folio_put(folio); > > - if (ret < len) > + if (ret) > iomap_write_failed(iter->inode, pos, len); > return ret; > } > @@ -754,7 +754,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > loff_t length = iomap_length(iter); > loff_t pos = iter->pos; > ssize_t written = 0; > - long status = 0; > + int status = 0; > > do { > struct folio *folio; > @@ -792,26 +792,24 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > copied = copy_page_from_iter_atomic(page, offset, bytes, i); > > status = iomap_write_end(iter, pos, bytes, copied, folio); > - > - if (unlikely(copied != status)) > - iov_iter_revert(i, copied - status); > - > - cond_resched(); > - if (unlikely(status == 0)) { > + if (unlikely(status)) { > /* > * A short copy made iomap_write_end() reject the > * thing entirely. Might be memory poisoning > * halfway through, might be a race with munmap, > * might be severe memory pressure. > */ > - if (copied) > + if (copied) { > + iov_iter_revert(i, copied); > bytes = copied; > + } > goto again; > } > - pos += status; > - written += status; > - length -= status; > + pos += bytes; > + written += bytes; > + length -= bytes; Ought to turn 'status' into 'copied' here, not 'bytes'. > > + cond_resched(); > balance_dirty_pages_ratelimited(iter->inode->i_mapping); > } while (iov_iter_count(i) && length); > > @@ -844,7 +842,6 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter) > const struct iomap *srcmap = iomap_iter_srcmap(iter); > loff_t pos = iter->pos; > loff_t length = iomap_length(iter); > - long status = 0; > loff_t written = 0; > > /* don't bother with blocks that are not shared to start with */ > @@ -858,20 +855,21 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter) > unsigned long offset = offset_in_page(pos); > unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length); > struct folio *folio; > + int status; > > status = iomap_write_begin(iter, pos, bytes, &folio); > if (unlikely(status)) > return status; > > status = iomap_write_end(iter, pos, bytes, bytes, folio); > - if (WARN_ON_ONCE(status == 0)) > - return -EIO; > + if (WARN_ON_ONCE(status)) > + return status; > > cond_resched(); > > - pos += status; > - written += status; > - length -= status; > + pos += bytes; > + written += bytes; > + length -= bytes; > > balance_dirty_pages_ratelimited(iter->inode->i_mapping); > } while (length); > @@ -925,9 +923,9 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) > folio_zero_range(folio, offset, bytes); > folio_mark_accessed(folio); > > - bytes = iomap_write_end(iter, pos, bytes, bytes, folio); > - if (WARN_ON_ONCE(bytes == 0)) > - return -EIO; > + status = iomap_write_end(iter, pos, bytes, bytes, folio); > + if (WARN_ON_ONCE(status)) > + return status; > > pos += bytes; > length -= bytes; > Thanks, Andreas
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 358ee1fb6f0d..8fb9b2797fc5 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -734,7 +734,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, folio_put(folio); if (ret < len) - iomap_write_failed(iter->inode, pos, len); + iomap_write_failed(iter->inode, pos + ret, len - ret); return ret; }
In iomap_write_end(), only call iomap_write_failed() on the byte range that has failed. This should improve code readability, but doesn't fix an actual bug because iomap_write_failed() is called after updating the file size here and it only affects the memory beyond the end of the file. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> --- fs/iomap/buffered-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)