[RFCv5,4/5] iomap: Allocate iop in ->write_begin() early

Message ID	e8401f45b8e441dc70effdb6b71fb67a3c92f837.1683485700.git.ritesh.list@gmail.com (mailing list archive)
State	Mainlined, archived
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox <willy@infradead.org>, Dave Chinner <david@fromorbit.com>, Brian Foster <bfoster@redhat.com>, Ojaswin Mujoo <ojaswin@linux.ibm.com>, Disha Goel <disgoel@linux.ibm.com>, "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Subject: [RFCv5 4/5] iomap: Allocate iop in ->write_begin() early Date: Mon, 8 May 2023 00:57:59 +0530 Message-Id: <e8401f45b8e441dc70effdb6b71fb67a3c92f837.1683485700.git.ritesh.list@gmail.com> In-Reply-To: <cover.1683485700.git.ritesh.list@gmail.com> References: <cover.1683485700.git.ritesh.list@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	iomap: Add support for per-block dirty state to improve write performance \| expand [RFCv5,0/5] iomap: Add support for per-block dirty state to improve write performance [RFCv5,1/5] iomap: Rename iomap_page_create/release() to iop_alloc/free() [RFCv5,2/5] iomap: Refactor iop_set_range_uptodate() function [RFCv5,3/5] iomap: Add iop's uptodate state handling functions [RFCv5,4/5] iomap: Allocate iop in ->write_begin() early [RFCv5,5/5] iomap: Add per-block dirty state tracking to improve performance

Message ID

e8401f45b8e441dc70effdb6b71fb67a3c92f837.1683485700.git.ritesh.list@gmail.com (mailing list archive)

State

Mainlined, archived

Headers

From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
To: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org,
        Matthew Wilcox <willy@infradead.org>,
        Dave Chinner <david@fromorbit.com>,
        Brian Foster <bfoster@redhat.com>,
        Ojaswin Mujoo <ojaswin@linux.ibm.com>,
        Disha Goel <disgoel@linux.ibm.com>,
        "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Subject: [RFCv5 4/5] iomap: Allocate iop in ->write_begin() early
Date: Mon,  8 May 2023 00:57:59 +0530
Message-Id: 
 <e8401f45b8e441dc70effdb6b71fb67a3c92f837.1683485700.git.ritesh.list@gmail.com>
In-Reply-To: <cover.1683485700.git.ritesh.list@gmail.com>
References: <cover.1683485700.git.ritesh.list@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

iomap: Add support for per-block dirty state to improve write performance | expand

Commit Message

Ritesh Harjani (IBM) May 7, 2023, 7:27 p.m. UTC

Earlier when the folio is uptodate, we only allocate iop at writeback
time (in iomap_writepage_map()). This is ok until now, but when we are
going to add support for per-block dirty state bitmap in iop, this
could cause some performance degradation. The reason is that if we don't
allocate iop during ->write_begin(), then we will never mark the
necessary dirty bits in ->write_end() call. And we will have to mark all
the bits as dirty at the writeback time, that could cause the same write
amplification and performance problems as it is now.

However, for all the writes with (pos, len) which completely overlaps
the given folio, there is no need to allocate an iop during
->write_begin(). So skip those cases.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/buffered-io.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig May 18, 2023, 6:21 a.m. UTC | #1

On Mon, May 08, 2023 at 12:57:59AM +0530, Ritesh Harjani (IBM) wrote:
> Earlier when the folio is uptodate, we only allocate iop at writeback

s/Earlier/Currently/ ?

> time (in iomap_writepage_map()). This is ok until now, but when we are
> going to add support for per-block dirty state bitmap in iop, this
> could cause some performance degradation. The reason is that if we don't
> allocate iop during ->write_begin(), then we will never mark the
> necessary dirty bits in ->write_end() call. And we will have to mark all
> the bits as dirty at the writeback time, that could cause the same write
> amplification and performance problems as it is now.
> 
> However, for all the writes with (pos, len) which completely overlaps
> the given folio, there is no need to allocate an iop during
> ->write_begin(). So skip those cases.

This reads a bit backwards, I'd suggest to mention early
allocation only happens for sub-page writes before going into the
details.

The changes themselves looks good to me.

Ritesh Harjani (IBM) May 19, 2023, 3:18 p.m. UTC | #2

Christoph Hellwig <hch@infradead.org> writes:

> On Mon, May 08, 2023 at 12:57:59AM +0530, Ritesh Harjani (IBM) wrote:
>> Earlier when the folio is uptodate, we only allocate iop at writeback
>
> s/Earlier/Currently/ ?
>
>> time (in iomap_writepage_map()). This is ok until now, but when we are
>> going to add support for per-block dirty state bitmap in iop, this
>> could cause some performance degradation. The reason is that if we don't
>> allocate iop during ->write_begin(), then we will never mark the
>> necessary dirty bits in ->write_end() call. And we will have to mark all
>> the bits as dirty at the writeback time, that could cause the same write
>> amplification and performance problems as it is now.
>>
>> However, for all the writes with (pos, len) which completely overlaps
>> the given folio, there is no need to allocate an iop during
>> ->write_begin(). So skip those cases.
>
> This reads a bit backwards, I'd suggest to mention early
> allocation only happens for sub-page writes before going into the
> details.
>

sub-page is a bit confusing here. Because we can have a large folio too
with blocks within that folio. So we decided to go with per-block
terminology [1].

[1]: https://lore.kernel.org/linux-xfs/ZFR%2FGuVca5nFlLYF@casper.infradead.org/

I am guessing you would like to me to re-write the above para. Is this better?

"We dont need to allocate an iop in ->write_begin() for writes where the
position and length completely overlap with the given folio.
Therefore, such cases are skipped."

> The changes themselves looks good to me.

Sure. Thanks!

-ritesh

Matthew Wilcox May 19, 2023, 3:53 p.m. UTC | #3

On Fri, May 19, 2023 at 08:48:37PM +0530, Ritesh Harjani wrote:
> Christoph Hellwig <hch@infradead.org> writes:
> 
> > On Mon, May 08, 2023 at 12:57:59AM +0530, Ritesh Harjani (IBM) wrote:
> >> Earlier when the folio is uptodate, we only allocate iop at writeback
> >
> > s/Earlier/Currently/ ?
> >
> >> time (in iomap_writepage_map()). This is ok until now, but when we are
> >> going to add support for per-block dirty state bitmap in iop, this
> >> could cause some performance degradation. The reason is that if we don't
> >> allocate iop during ->write_begin(), then we will never mark the
> >> necessary dirty bits in ->write_end() call. And we will have to mark all
> >> the bits as dirty at the writeback time, that could cause the same write
> >> amplification and performance problems as it is now.
> >>
> >> However, for all the writes with (pos, len) which completely overlaps
> >> the given folio, there is no need to allocate an iop during
> >> ->write_begin(). So skip those cases.
> >
> > This reads a bit backwards, I'd suggest to mention early
> > allocation only happens for sub-page writes before going into the
> > details.
> >
> 
> sub-page is a bit confusing here. Because we can have a large folio too
> with blocks within that folio. So we decided to go with per-block
> terminology [1].
> 
> [1]: https://lore.kernel.org/linux-xfs/ZFR%2FGuVca5nFlLYF@casper.infradead.org/
> 
> I am guessing you would like to me to re-write the above para. Is this better?
> 
> "We dont need to allocate an iop in ->write_begin() for writes where the
> position and length completely overlap with the given folio.
> Therefore, such cases are skipped."

... and reorder that paragraph to be first.

Ritesh Harjani (IBM) May 22, 2023, 4:05 a.m. UTC | #4

Matthew Wilcox <willy@infradead.org> writes:

> On Fri, May 19, 2023 at 08:48:37PM +0530, Ritesh Harjani wrote:
>> Christoph Hellwig <hch@infradead.org> writes:
>>
>> > On Mon, May 08, 2023 at 12:57:59AM +0530, Ritesh Harjani (IBM) wrote:
>> >> Earlier when the folio is uptodate, we only allocate iop at writeback
>> >
>> > s/Earlier/Currently/ ?
>> >
>> >> time (in iomap_writepage_map()). This is ok until now, but when we are
>> >> going to add support for per-block dirty state bitmap in iop, this
>> >> could cause some performance degradation. The reason is that if we don't
>> >> allocate iop during ->write_begin(), then we will never mark the
>> >> necessary dirty bits in ->write_end() call. And we will have to mark all
>> >> the bits as dirty at the writeback time, that could cause the same write
>> >> amplification and performance problems as it is now.
>> >>
>> >> However, for all the writes with (pos, len) which completely overlaps
>> >> the given folio, there is no need to allocate an iop during
>> >> ->write_begin(). So skip those cases.
>> >
>> > This reads a bit backwards, I'd suggest to mention early
>> > allocation only happens for sub-page writes before going into the
>> > details.
>> >
>>
>> sub-page is a bit confusing here. Because we can have a large folio too
>> with blocks within that folio. So we decided to go with per-block
>> terminology [1].
>>
>> [1]: https://lore.kernel.org/linux-xfs/ZFR%2FGuVca5nFlLYF@casper.infradead.org/
>>
>> I am guessing you would like to me to re-write the above para. Is this better?
>>
>> "We dont need to allocate an iop in ->write_begin() for writes where the
>> position and length completely overlap with the given folio.
>> Therefore, such cases are skipped."
>
> ... and reorder that paragraph to be first.

Sure.

-ritesh

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 5103b644e115..25f20f269214 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -599,15 +599,25 @@  static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
 	size_t from = offset_in_folio(folio, pos), to = from + len;
 	size_t poff, plen;
 
-	if (folio_test_uptodate(folio))
+	/*
+	 * If the write completely overlaps the current folio, then
+	 * entire folio will be dirtied so there is no need for
+	 * per-block state tracking structures to be attached to this folio.
+	 */
+	if (pos <= folio_pos(folio) &&
+	    pos + len >= folio_pos(folio) + folio_size(folio))
 		return 0;
-	folio_clear_error(folio);
 
 	iop = iop_alloc(iter->inode, folio, iter->flags);
 
 	if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1)
 		return -EAGAIN;
 
+	if (folio_test_uptodate(folio))
+		return 0;
+	folio_clear_error(folio);
+
+
 	do {
 		iomap_adjust_read_range(iter->inode, folio, &block_start,
 				block_end - block_start, &poff, &plen);

[RFCv5,4/5] iomap: Allocate iop in ->write_begin() early

Commit Message

Comments

Patch