Message ID | 34dafb5e15dba3bb0b0e072404ac6fb9f11561b8.1677428794.git.ritesh.list@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iomap: Add support for subpage dirty state tracking to improve write performance | expand |
On Mon, Feb 27, 2023 at 01:13:30AM +0530, Ritesh Harjani (IBM) wrote: > Earlier when the folio is uptodate, we only allocate iop at writeback > time (in iomap_writepage_map()). This is ok until now, but when we are > going to add support for subpage size dirty bitmap tracking in iop, this > could cause some performance degradation. The reason is that if we don't > allocate iop during ->write_begin(), then we will never mark the > necessary dirty bits in ->write_end() call. And we will have to mark all > the bits as dirty at the writeback time, that could cause the same write > amplification and performance problems as it is now (w/o subpage dirty > bitmap tracking in iop). > > However, for all the writes with (pos, len) which completely overlaps > the given folio, there is no need to allocate an iop during > ->write_begin(). So skip those cases. > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > --- > fs/iomap/buffered-io.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 356193e44cf0..c5b51ab1184e 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -535,11 +535,16 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, > size_t from = offset_in_folio(folio, pos), to = from + len; > size_t poff, plen; > > + if (pos <= folio_pos(folio) && > + pos + len >= folio_pos(folio) + folio_size(folio)) > + return 0; This is magic without a comment explaining why it exists. You have that explanation in the commit message, but that doesn't help anyone looking at the code: /* * If the write completely overlaps the current folio, then * entire folio will be dirtied so there is no need for * sub-folio state tracking structures to be attached to this folio. */ -Dave.
On Mon, Feb 27, 2023 at 01:13:30AM +0530, Ritesh Harjani (IBM) wrote: > +++ b/fs/iomap/buffered-io.c > @@ -535,11 +535,16 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, > size_t from = offset_in_folio(folio, pos), to = from + len; > size_t poff, plen; > > + if (pos <= folio_pos(folio) && > + pos + len >= folio_pos(folio) + folio_size(folio)) > + return 0; > + > + iop = iomap_page_create(iter->inode, folio, iter->flags); > + > if (folio_test_uptodate(folio)) > return 0; > folio_clear_error(folio); > > - iop = iomap_page_create(iter->inode, folio, iter->flags); > if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) > return -EAGAIN; Don't you want to move the -EAGAIN check up too? Otherwise an io_uring write will dirty the entire folio rather than a block. It occurs to me (even though I was the one who suggested the current check) that pos <= folio_pos etc is actually a bit tighter than necessary. We could get away with: if (pos < folio_pos(folio) + block_size && pos + len > folio_pos(folio) + folio_size(folio) - block_size) since that will also cause the entire folio to be dirtied. Not sure if it's worth it.
Dave Chinner <david@fromorbit.com> writes: > On Mon, Feb 27, 2023 at 01:13:30AM +0530, Ritesh Harjani (IBM) wrote: >> Earlier when the folio is uptodate, we only allocate iop at writeback >> time (in iomap_writepage_map()). This is ok until now, but when we are >> going to add support for subpage size dirty bitmap tracking in iop, this >> could cause some performance degradation. The reason is that if we don't >> allocate iop during ->write_begin(), then we will never mark the >> necessary dirty bits in ->write_end() call. And we will have to mark all >> the bits as dirty at the writeback time, that could cause the same write >> amplification and performance problems as it is now (w/o subpage dirty >> bitmap tracking in iop). >> >> However, for all the writes with (pos, len) which completely overlaps >> the given folio, there is no need to allocate an iop during >> ->write_begin(). So skip those cases. >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> >> --- >> fs/iomap/buffered-io.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c >> index 356193e44cf0..c5b51ab1184e 100644 >> --- a/fs/iomap/buffered-io.c >> +++ b/fs/iomap/buffered-io.c >> @@ -535,11 +535,16 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, >> size_t from = offset_in_folio(folio, pos), to = from + len; >> size_t poff, plen; >> >> + if (pos <= folio_pos(folio) && >> + pos + len >= folio_pos(folio) + folio_size(folio)) >> + return 0; > > This is magic without a comment explaining why it exists. You have > that explanation in the commit message, but that doesn't help anyone > looking at the code: > > /* > * If the write completely overlaps the current folio, then > * entire folio will be dirtied so there is no need for > * sub-folio state tracking structures to be attached to this folio. > */ Sure, got it. I will add a comment which explains this in the code as well. Thanks for the review! -ritesh
Matthew Wilcox <willy@infradead.org> writes: > On Mon, Feb 27, 2023 at 01:13:30AM +0530, Ritesh Harjani (IBM) wrote: >> +++ b/fs/iomap/buffered-io.c >> @@ -535,11 +535,16 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, >> size_t from = offset_in_folio(folio, pos), to = from + len; >> size_t poff, plen; >> >> + if (pos <= folio_pos(folio) && >> + pos + len >= folio_pos(folio) + folio_size(folio)) >> + return 0; >> + >> + iop = iomap_page_create(iter->inode, folio, iter->flags); >> + >> if (folio_test_uptodate(folio)) >> return 0; >> folio_clear_error(folio); >> >> - iop = iomap_page_create(iter->inode, folio, iter->flags); >> if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) >> return -EAGAIN; > > Don't you want to move the -EAGAIN check up too? Otherwise an > io_uring write will dirty the entire folio rather than a block. I am not entirely convinced whether we should move this check up (to put it just after the iop allocation). The reason is if the folio is uptodate then it is ok to return 0 rather than -EAGAIN, because we are anyway not going to read the folio from disk (given it is completely uptodate). Thoughts? Or am I missing anything here. > > It occurs to me (even though I was the one who suggested the current > check) that pos <= folio_pos etc is actually a bit tighter than > necessary. We could get away with: > > if (pos < folio_pos(folio) + block_size && > pos + len > folio_pos(folio) + folio_size(folio) - block_size) > > since that will also cause the entire folio to be dirtied. Not sure if > it's worth it. I am not sure of how much impact such a change can cause. But I agree that the above check is much lighter in terms of restriction. Let me spend some more time thinking it through. Thanks for the review! -ritesh
On Wed, Mar 01, 2023 at 12:03:48AM +0530, Ritesh Harjani wrote: > Matthew Wilcox <willy@infradead.org> writes: > > > On Mon, Feb 27, 2023 at 01:13:30AM +0530, Ritesh Harjani (IBM) wrote: > >> +++ b/fs/iomap/buffered-io.c > >> @@ -535,11 +535,16 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, > >> size_t from = offset_in_folio(folio, pos), to = from + len; > >> size_t poff, plen; > >> > >> + if (pos <= folio_pos(folio) && > >> + pos + len >= folio_pos(folio) + folio_size(folio)) > >> + return 0; > >> + > >> + iop = iomap_page_create(iter->inode, folio, iter->flags); > >> + > >> if (folio_test_uptodate(folio)) > >> return 0; > >> folio_clear_error(folio); > >> > >> - iop = iomap_page_create(iter->inode, folio, iter->flags); > >> if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) > >> return -EAGAIN; > > > > Don't you want to move the -EAGAIN check up too? Otherwise an > > io_uring write will dirty the entire folio rather than a block. > > I am not entirely convinced whether we should move this check up > (to put it just after the iop allocation). The reason is if the folio is > uptodate then it is ok to return 0 rather than -EAGAIN, because we are > anyway not going to read the folio from disk (given it is completely > uptodate). > > Thoughts? Or am I missing anything here. But then we won't have an iop, so a write will dirty the entire folio instead of just the blocks you want to dirty.
Matthew Wilcox <willy@infradead.org> writes: > On Wed, Mar 01, 2023 at 12:03:48AM +0530, Ritesh Harjani wrote: >> Matthew Wilcox <willy@infradead.org> writes: >> >> > On Mon, Feb 27, 2023 at 01:13:30AM +0530, Ritesh Harjani (IBM) wrote: >> >> +++ b/fs/iomap/buffered-io.c >> >> @@ -535,11 +535,16 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, >> >> size_t from = offset_in_folio(folio, pos), to = from + len; >> >> size_t poff, plen; >> >> >> >> + if (pos <= folio_pos(folio) && >> >> + pos + len >= folio_pos(folio) + folio_size(folio)) >> >> + return 0; >> >> + >> >> + iop = iomap_page_create(iter->inode, folio, iter->flags); >> >> + >> >> if (folio_test_uptodate(folio)) >> >> return 0; >> >> folio_clear_error(folio); >> >> >> >> - iop = iomap_page_create(iter->inode, folio, iter->flags); >> >> if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) >> >> return -EAGAIN; >> > >> > Don't you want to move the -EAGAIN check up too? Otherwise an >> > io_uring write will dirty the entire folio rather than a block. >> >> I am not entirely convinced whether we should move this check up >> (to put it just after the iop allocation). The reason is if the folio is >> uptodate then it is ok to return 0 rather than -EAGAIN, because we are >> anyway not going to read the folio from disk (given it is completely >> uptodate). >> >> Thoughts? Or am I missing anything here. > > But then we won't have an iop, so a write will dirty the entire folio > instead of just the blocks you want to dirty. Ok, I got what you are saying. Make sense. I will give it a try. Thanks -ritesh
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 356193e44cf0..c5b51ab1184e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -535,11 +535,16 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, size_t from = offset_in_folio(folio, pos), to = from + len; size_t poff, plen; + if (pos <= folio_pos(folio) && + pos + len >= folio_pos(folio) + folio_size(folio)) + return 0; + + iop = iomap_page_create(iter->inode, folio, iter->flags); + if (folio_test_uptodate(folio)) return 0; folio_clear_error(folio); - iop = iomap_page_create(iter->inode, folio, iter->flags); if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) return -EAGAIN;
Earlier when the folio is uptodate, we only allocate iop at writeback time (in iomap_writepage_map()). This is ok until now, but when we are going to add support for subpage size dirty bitmap tracking in iop, this could cause some performance degradation. The reason is that if we don't allocate iop during ->write_begin(), then we will never mark the necessary dirty bits in ->write_end() call. And we will have to mark all the bits as dirty at the writeback time, that could cause the same write amplification and performance problems as it is now (w/o subpage dirty bitmap tracking in iop). However, for all the writes with (pos, len) which completely overlaps the given folio, there is no need to allocate an iop during ->write_begin(). So skip those cases. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/iomap/buffered-io.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)