Message ID | 20230612203910.724378-9-willy@infradead.org (mailing list archive) |
---|---|
State | Under Review |
Headers | show |
Series | Create large folios in iomap buffered write path | expand |
On Mon, Jun 12, 2023 at 09:39:10PM +0100, Matthew Wilcox (Oracle) wrote: > If we have a large folio, we can copy in larger chunks than PAGE_SIZE. > Start at the maximum page cache size and shrink by half every time we > hit the "we are short on memory" problem. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > fs/iomap/buffered-io.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index a5d62c9640cf..818dc350ffc5 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > { > loff_t length = iomap_length(iter); > + size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; This could overflow if the chunk size ends up bigger than 4GB, but I guess that's mostly theoretical. > - copied = copy_page_from_iter_atomic(page, offset, bytes, i); > + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); Would be nice t avoid the overly long line here Otherwise looks good: Reviewed-by: Christoph Hellwig <hch@lst.de>
On Mon, Jun 12, 2023 at 09:58:54PM -0700, Christoph Hellwig wrote: > On Mon, Jun 12, 2023 at 09:39:10PM +0100, Matthew Wilcox (Oracle) wrote: > > If we have a large folio, we can copy in larger chunks than PAGE_SIZE. > > Start at the maximum page cache size and shrink by half every time we > > hit the "we are short on memory" problem. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > --- > > fs/iomap/buffered-io.c | 22 +++++++++++++--------- > > 1 file changed, 13 insertions(+), 9 deletions(-) > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > > index a5d62c9640cf..818dc350ffc5 100644 > > --- a/fs/iomap/buffered-io.c > > +++ b/fs/iomap/buffered-io.c > > @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > > static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > > { > > loff_t length = iomap_length(iter); > > + size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; > > This could overflow if the chunk size ends up bigger than 4GB, but > I guess that's mostly theoretical. I don't think it can ... we currently restrict it to PMD_SIZE if THP are enabled and order-8 if they're not. I could add a MAX_PAGECACHE_SIZE if needed, but PAGE_SIZE is 'unsigned long' on most if not all platforms, so it's always the same size as size_t. We definitely can't create folios larger than size_t, so MAX_PAGECACHE_ORDER is never going to be defined such that PAGE_SIZE << MAX_PAGECACHE_ORDER cannot fit in size_t. The largest I can see it going would be on something like PowerPC with its 16GB page size, and there we definitely have 1UL << PAGE_SHIFT. > > - copied = copy_page_from_iter_atomic(page, offset, bytes, i); > > + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); > > Would be nice t avoid the overly long line here The plan is to turn that into: copied = copy_folio_from_iter_atomic(folio, offset, bytes, i); in the fairly near future. > Otherwise looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> Thanks!
"Matthew Wilcox (Oracle)" <willy@infradead.org> writes: > If we have a large folio, we can copy in larger chunks than PAGE_SIZE. > Start at the maximum page cache size and shrink by half every time we > hit the "we are short on memory" problem. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > --- > fs/iomap/buffered-io.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index a5d62c9640cf..818dc350ffc5 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, > static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > { > loff_t length = iomap_length(iter); > + size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; > loff_t pos = iter->pos; > ssize_t written = 0; > long status = 0; > @@ -776,15 +777,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > > do { > struct folio *folio; > - struct page *page; > - unsigned long offset; /* Offset into pagecache page */ > - unsigned long bytes; /* Bytes to write to page */ > + size_t offset; /* Offset into folio */ > + unsigned long bytes; /* Bytes to write to folio */ why not keep typeof "bytes" as size_t same as of "copied". > size_t copied; /* Bytes copied from user */ > > - offset = offset_in_page(pos); > - bytes = min_t(unsigned long, PAGE_SIZE - offset, > - iov_iter_count(i)); > again: > + offset = pos & (chunk - 1); > + bytes = min(chunk - offset, iov_iter_count(i)); > status = balance_dirty_pages_ratelimited_flags(mapping, > bdp_flags); > if (unlikely(status)) > @@ -814,11 +813,14 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > if (iter->iomap.flags & IOMAP_F_STALE) > break; > > - page = folio_file_page(folio, pos >> PAGE_SHIFT); > + offset = offset_in_folio(folio, pos); > + if (bytes > folio_size(folio) - offset) > + bytes = folio_size(folio) - offset; > + > if (mapping_writably_mapped(mapping)) > - flush_dcache_page(page); > + flush_dcache_folio(folio); > > - copied = copy_page_from_iter_atomic(page, offset, bytes, i); > + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); > > status = iomap_write_end(iter, pos, bytes, copied, folio); > > @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > */ > if (copied) > bytes = copied; I think with your code change which changes the label position of "again", the above lines doing bytes = copied becomes dead code. We anyway recalculate bytes after "again" label. -ritesh > + if (chunk > PAGE_SIZE) > + chunk /= 2; > goto again; > } > pos += status; > -- > 2.39.2
On Sat, Jun 17, 2023 at 12:43:59PM +0530, Ritesh Harjani wrote: > > do { > > struct folio *folio; > > - struct page *page; > > - unsigned long offset; /* Offset into pagecache page */ > > - unsigned long bytes; /* Bytes to write to page */ > > + size_t offset; /* Offset into folio */ > > + unsigned long bytes; /* Bytes to write to folio */ > > why not keep typeof "bytes" as size_t same as of "copied". Sure, makes sense. > > @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > > */ > > if (copied) > > bytes = copied; > > I think with your code change which changes the label position of > "again", the above lines doing bytes = copied becomes dead code. > We anyway recalculate bytes after "again" label. Yes, you're right. Removed. I had a good think about whether this forgotten removal meant an overlooked problem, but I can't see one.
On Tue, Jun 13, 2023 at 08:43:49PM +0100, Matthew Wilcox wrote: > > > - copied = copy_page_from_iter_atomic(page, offset, bytes, i); > > > + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); > > > > Would be nice t avoid the overly long line here > > The plan is to turn that into: > > copied = copy_folio_from_iter_atomic(folio, offset, bytes, i); > > in the fairly near future. Kent bugged me to add copy_folio_from_iter_atomic() now, and he's right, so "the near future" is v4.
On Mon, Jun 19, 2023 at 06:09:42PM +0100, Matthew Wilcox wrote: > > > @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) > > > */ > > > if (copied) > > > bytes = copied; > > > > I think with your code change which changes the label position of > > "again", the above lines doing bytes = copied becomes dead code. > > We anyway recalculate bytes after "again" label. > > Yes, you're right. Removed. I had a good think about whether this > forgotten removal meant an overlooked problem, but I can't see one. ... also, removing this means that 'goto again' has the same effect as 'continue' ... which means we can actually restructure the loop slightly and avoid the again label, the goto and even the continue. Patch to follow in a few hours.
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index a5d62c9640cf..818dc350ffc5 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -768,6 +768,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len, static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) { loff_t length = iomap_length(iter); + size_t chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; loff_t pos = iter->pos; ssize_t written = 0; long status = 0; @@ -776,15 +777,13 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) do { struct folio *folio; - struct page *page; - unsigned long offset; /* Offset into pagecache page */ - unsigned long bytes; /* Bytes to write to page */ + size_t offset; /* Offset into folio */ + unsigned long bytes; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ - offset = offset_in_page(pos); - bytes = min_t(unsigned long, PAGE_SIZE - offset, - iov_iter_count(i)); again: + offset = pos & (chunk - 1); + bytes = min(chunk - offset, iov_iter_count(i)); status = balance_dirty_pages_ratelimited_flags(mapping, bdp_flags); if (unlikely(status)) @@ -814,11 +813,14 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) if (iter->iomap.flags & IOMAP_F_STALE) break; - page = folio_file_page(folio, pos >> PAGE_SHIFT); + offset = offset_in_folio(folio, pos); + if (bytes > folio_size(folio) - offset) + bytes = folio_size(folio) - offset; + if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); + flush_dcache_folio(folio); - copied = copy_page_from_iter_atomic(page, offset, bytes, i); + copied = copy_page_from_iter_atomic(&folio->page, offset, bytes, i); status = iomap_write_end(iter, pos, bytes, copied, folio); @@ -835,6 +837,8 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) */ if (copied) bytes = copied; + if (chunk > PAGE_SIZE) + chunk /= 2; goto again; } pos += status;
If we have a large folio, we can copy in larger chunks than PAGE_SIZE. Start at the maximum page cache size and shrink by half every time we hit the "we are short on memory" problem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- fs/iomap/buffered-io.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)