[v3,8/8] iomap: Copy larger chunks from userspace

Message ID	20230612203910.724378-9-willy@infradead.org (mailing list archive)
State	Under Review
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: linux-fsdevel@vger.kernel.org Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-xfs@vger.kernel.org, Wang Yugui <wangyugui@e16-tech.com>, Dave Chinner <david@fromorbit.com>, Christoph Hellwig <hch@infradead.org>, "Darrick J . Wong" <djwong@kernel.org> Subject: [PATCH v3 8/8] iomap: Copy larger chunks from userspace Date: Mon, 12 Jun 2023 21:39:10 +0100 Message-Id: <20230612203910.724378-9-willy@infradead.org> In-Reply-To: <20230612203910.724378-1-willy@infradead.org> References: <20230612203910.724378-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Create large folios in iomap buffered write path \| expand [v3,0/8] Create large folios in iomap buffered write path [v3,1/8] iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic() [v3,2/8] iomap: Remove large folio handling in iomap_invalidate_folio() [v3,3/8] doc: Correct the description of ->release_folio [v3,4/8] iomap: Remove unnecessary test from iomap_release_folio() [v3,5/8] filemap: Add fgf_t typedef [v3,6/8] filemap: Allow __filemap_get_folio to allocate large folios [v3,7/8] iomap: Create large folios in the buffered write path [v3,8/8] iomap: Copy larger chunks from userspace

Matthew Wilcox June 12, 2023, 8:39 p.m. UTC

If we have a large folio, we can copy in larger chunks than PAGE_SIZE.
Start at the maximum page cache size and shrink by half every time we
hit the "we are short on memory" problem.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/iomap/buffered-io.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

Christoph Hellwig June 13, 2023, 4:58 a.m. UTC | #1

On Mon, Jun 12, 2023 at 09:39:10PM +0100, Matthew Wilcox (Oracle) wrote:
> If we have a large folio, we can copy in larger chunks than PAGE_SIZE.
> Start at the maximum page cache size and shrink by half every time we
> hit the "we are short on memory" problem.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a5d62c9640cf..818dc350ffc5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
>  static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  {
>  	loff_t length = iomap_length(iter);
> +	size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;

This could overflow if the chunk size ends up bigger than 4GB, but
I guess that's mostly theoretical.

> -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);

Would be nice t avoid the overly long line here

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Matthew Wilcox June 13, 2023, 7:43 p.m. UTC | #2

On Mon, Jun 12, 2023 at 09:58:54PM -0700, Christoph Hellwig wrote:
> On Mon, Jun 12, 2023 at 09:39:10PM +0100, Matthew Wilcox (Oracle) wrote:
> > If we have a large folio, we can copy in larger chunks than PAGE_SIZE.
> > Start at the maximum page cache size and shrink by half every time we
> > hit the "we are short on memory" problem.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  fs/iomap/buffered-io.c | 22 +++++++++++++---------
> >  1 file changed, 13 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index a5d62c9640cf..818dc350ffc5 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
> >  static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> >  {
> >  	loff_t length = iomap_length(iter);
> > +	size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
> 
> This could overflow if the chunk size ends up bigger than 4GB, but
> I guess that's mostly theoretical.

I don't think it can ... we currently restrict it to PMD_SIZE if THP are
enabled and order-8 if they're not.  I could add a MAX_PAGECACHE_SIZE if
needed, but PAGE_SIZE is 'unsigned long' on most if not all platforms,
so it's always the same size as size_t.  We definitely can't create
folios larger than size_t, so MAX_PAGECACHE_ORDER is never going to be
defined such that PAGE_SIZE << MAX_PAGECACHE_ORDER cannot fit in size_t.

The largest I can see it going would be on something like PowerPC with
its 16GB page size, and there we definitely have 1UL << PAGE_SHIFT.

> > -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> > +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
> 
> Would be nice t avoid the overly long line here

The plan is to turn that into:

		copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);

in the fairly near future.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

Ritesh Harjani (IBM) June 17, 2023, 7:13 a.m. UTC | #3

"Matthew Wilcox (Oracle)" <willy@infradead.org> writes:

> If we have a large folio, we can copy in larger chunks than PAGE_SIZE.
> Start at the maximum page cache size and shrink by half every time we
> hit the "we are short on memory" problem.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/iomap/buffered-io.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index a5d62c9640cf..818dc350ffc5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
>  static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  {
>  	loff_t length = iomap_length(iter);
> +	size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
>  	loff_t pos = iter->pos;
>  	ssize_t written = 0;
>  	long status = 0;
> @@ -776,15 +777,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  
>  	do {
>  		struct folio *folio;
> -		struct page *page;
> -		unsigned long offset;	/* Offset into pagecache page */
> -		unsigned long bytes;	/* Bytes to write to page */
> +		size_t offset;		/* Offset into folio */
> +		unsigned long bytes;	/* Bytes to write to folio */

why not keep typeof "bytes" as size_t same as of "copied".

>  		size_t copied;		/* Bytes copied from user */
>  
> -		offset = offset_in_page(pos);
> -		bytes = min_t(unsigned long, PAGE_SIZE - offset,
> -						iov_iter_count(i));
>  again:
> +		offset = pos & (chunk - 1);
> +		bytes = min(chunk - offset, iov_iter_count(i));
>  		status = balance_dirty_pages_ratelimited_flags(mapping,
>  							       bdp_flags);
>  		if (unlikely(status))
> @@ -814,11 +813,14 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  		if (iter->iomap.flags & IOMAP_F_STALE)
>  			break;
>  
> -		page = folio_file_page(folio, pos >> PAGE_SHIFT);
> +		offset = offset_in_folio(folio, pos);
> +		if (bytes > folio_size(folio) - offset)
> +			bytes = folio_size(folio) - offset;
> +
>  		if (mapping_writably_mapped(mapping))
> -			flush_dcache_page(page);
> +			flush_dcache_folio(folio);
>  
> -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
>  
>  		status = iomap_write_end(iter, pos, bytes, copied, folio);
>  
> @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  			 */
>  			if (copied)
>  				bytes = copied;

I think with your code change which changes the label position of
"again", the above lines doing bytes = copied becomes dead code.
We anyway recalculate bytes after "again" label. 


-ritesh


> +			if (chunk > PAGE_SIZE)
> +				chunk /= 2;
>  			goto again;
>  		}
>  		pos += status;
> -- 
> 2.39.2

Matthew Wilcox June 19, 2023, 5:09 p.m. UTC | #4

On Sat, Jun 17, 2023 at 12:43:59PM +0530, Ritesh Harjani wrote:
> >  	do {
> >  		struct folio *folio;
> > -		struct page *page;
> > -		unsigned long offset;	/* Offset into pagecache page */
> > -		unsigned long bytes;	/* Bytes to write to page */
> > +		size_t offset;		/* Offset into folio */
> > +		unsigned long bytes;	/* Bytes to write to folio */
> 
> why not keep typeof "bytes" as size_t same as of "copied".

Sure, makes sense.

> > @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> >  			 */
> >  			if (copied)
> >  				bytes = copied;
> 
> I think with your code change which changes the label position of
> "again", the above lines doing bytes = copied becomes dead code.
> We anyway recalculate bytes after "again" label. 

Yes, you're right.  Removed.  I had a good think about whether this
forgotten removal meant an overlooked problem, but I can't see one.

Matthew Wilcox July 10, 2023, 3:45 a.m. UTC | #5

On Tue, Jun 13, 2023 at 08:43:49PM +0100, Matthew Wilcox wrote:
> > > -		copied = copy_page_from_iter_atomic(page, offset, bytes, i);
> > > +		copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i);
> > 
> > Would be nice t avoid the overly long line here
> 
> The plan is to turn that into:
> 
> 		copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
> 
> in the fairly near future.

Kent bugged me to add copy_folio_from_iter_atomic() now, and he's right,
so "the near future" is v4.

Matthew Wilcox July 10, 2023, 3:57 a.m. UTC | #6

On Mon, Jun 19, 2023 at 06:09:42PM +0100, Matthew Wilcox wrote:
> > > @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
> > >  			 */
> > >  			if (copied)
> > >  				bytes = copied;
> > 
> > I think with your code change which changes the label position of
> > "again", the above lines doing bytes = copied becomes dead code.
> > We anyway recalculate bytes after "again" label. 
> 
> Yes, you're right.  Removed.  I had a good think about whether this
> forgotten removal meant an overlooked problem, but I can't see one.

... also, removing this means that 'goto again' has the same effect as
'continue' ... which means we can actually restructure the loop slightly
and avoid the again label, the goto and even the continue.  Patch to
follow in a few hours.

[v3,8/8] iomap: Copy larger chunks from userspace

Commit Message

Comments

Patch