Message ID | 20211108040551.1942823-20-willy@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iomap/xfs folio patches | expand |
On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote: > The zero iterator can work in folio-sized chunks instead of page-sized > chunks. This will save a lot of page cache lookups if the file is cached > in multi-page folios. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> but it will clash with my just sent series that decouples DAX zeroing from buffered I/O zeroing and folds __iomap_zero_iter into the caller.
On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote: > The zero iterator can work in folio-sized chunks instead of page-sized > chunks. This will save a lot of page cache lookups if the file is cached > in multi-page folios. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> hch's dax decoupling series notwithstanding, Though TBH I am kinda wondering how the two of you plan to resolve those kinds of differences -- I haven't looked at that series, though I think this one's been waiting in the wings for longer? Heck, I wonder how Matthew plans to merge all this given that it touches mm, fs, block, and iomap...? Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/iomap/buffered-io.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 64e54981b651..9c61d12028ca 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -881,17 +881,20 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare); > > static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length) > { > + struct folio *folio; > struct page *page; > int status; > - unsigned offset = offset_in_page(pos); > - unsigned bytes = min_t(u64, PAGE_SIZE - offset, length); > + size_t offset, bytes; > > - status = iomap_write_begin(iter, pos, bytes, &page); > + status = iomap_write_begin(iter, pos, length, &page); > if (status) > return status; > + folio = page_folio(page); > > - zero_user(page, offset, bytes); > - mark_page_accessed(page); > + offset = offset_in_folio(folio, pos); > + bytes = min_t(u64, folio_size(folio) - offset, length); > + folio_zero_range(folio, offset, bytes); > + folio_mark_accessed(folio); > > return iomap_write_end(iter, pos, bytes, bytes, page); > } > -- > 2.33.0 >
On Tue, Nov 16, 2021 at 06:24:24PM -0800, Darrick J. Wong wrote: > On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote: > > The zero iterator can work in folio-sized chunks instead of page-sized > > chunks. This will save a lot of page cache lookups if the file is cached > > in multi-page folios. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > hch's dax decoupling series notwithstanding, > > Though TBH I am kinda wondering how the two of you plan to resolve those > kinds of differences -- I haven't looked at that series, though I think > this one's been waiting in the wings for longer? I haven't looked at that series either > Heck, I wonder how Matthew plans to merge all this given that it touches > mm, fs, block, and iomap...? I'm planning on sending a pull request to Linus on Monday for the first few patches in this series: https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/for-next Then I was hoping you'd take the block + fs/buffer + iomap pieces for the next merge window. > Reviewed-by: Darrick J. Wong <djwong@kernel.org> Thanks! Going through and collecting all these now ...
On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote: > +++ b/fs/iomap/buffered-io.c > @@ -881,17 +881,20 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare); > > static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length) > { > + struct folio *folio; > struct page *page; > int status; > - unsigned offset = offset_in_page(pos); > - unsigned bytes = min_t(u64, PAGE_SIZE - offset, length); > + size_t offset, bytes; > > - status = iomap_write_begin(iter, pos, bytes, &page); > + status = iomap_write_begin(iter, pos, length, &page); This turned out to be buggy. Darrick and I figured out why his tests were failing and mine weren't; this only shows up with a 4kB block size filesystem and I was only testing with 1kB block size filesystems. (at least on x86; I haven't figured out why it passes with 1kB block size filesystems, so I'm not sure what would be true on other filesystems). iomap_write_begin() is not prepared to deal with a length that spans a page boundary. So I'm replacing this patch with the following patches (whitespace damaged; pick them up from https://git.infradead.org/users/willy/linux.git/tag/refs/tags/iomap-folio-5.17c if you want to compile them): commit 412212960b72 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Thu Dec 9 15:47:44 2021 -0500 iomap: Allow iomap_write_begin() to be called with the full length In the future, we want write_begin to know the entire length of the write so that it can choose to allocate large folios. Pass the full length in from __iomap_zero_iter() and limit it where necessary. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index d67108489148..9270db17c435 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -968,6 +968,9 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos, struct gfs2_sbd *sdp = GFS2_SB(inode); unsigned int blocks; + /* gfs2 does not support large folios yet */ + if (len > PAGE_SIZE) + len = PAGE_SIZE; blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits; return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0); } diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 8d7a67655b60..67fcd3b9928d 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -632,6 +632,8 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos, goto out_no_page; } folio = page_folio(page); + if (pos + len > folio_pos(folio) + folio_size(folio)) + len = folio_pos(folio) + folio_size(folio) - pos; if (srcmap->type == IOMAP_INLINE) status = iomap_write_begin_inline(iter, page); @@ -891,16 +893,19 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff _t pos, u64 length) struct page *page; int status; unsigned offset = offset_in_page(pos); - unsigned bytes = min_t(u64, PAGE_SIZE - offset, length); - status = iomap_write_begin(iter, pos, bytes, &page); + if (length > UINT_MAX) + length = UINT_MAX; + status = iomap_write_begin(iter, pos, length, &page); if (status) return status; + if (length > PAGE_SIZE - offset) + length = PAGE_SIZE - offset; - zero_user(page, offset, bytes); + zero_user(page, offset, length); mark_page_accessed(page); - return iomap_write_end(iter, pos, bytes, bytes, page); + return iomap_write_end(iter, pos, length, length, page); } static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) commit 78c747a1b3a1 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Fri Nov 5 14:24:09 2021 -0400 iomap: Convert __iomap_zero_iter to use a folio The zero iterator can work in folio-sized chunks instead of page-sized chunks. This will save a lot of page cache lookups if the file is cached in large folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 67fcd3b9928d..bbde6d4f27cd 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -890,20 +890,23 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare); static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length) { + struct folio *folio; struct page *page; int status; - unsigned offset = offset_in_page(pos); + size_t offset; if (length > UINT_MAX) length = UINT_MAX; status = iomap_write_begin(iter, pos, length, &page); if (status) return status; - if (length > PAGE_SIZE - offset) - length = PAGE_SIZE - offset; + folio = page_folio(page); - zero_user(page, offset, length); - mark_page_accessed(page); + offset = offset_in_folio(folio, pos); + if (length > folio_size(folio) - offset) + length = folio_size(folio) - offset; + folio_zero_range(folio, offset, length); + folio_mark_accessed(folio); return iomap_write_end(iter, pos, length, length, page); } The xfstests that Darrick identified as failing all passed. Running a full sweep now; then I'll re-run with a 1kB filesystem to be sure that still passes. Then I'll send another pull request.
On Thu, Dec 09, 2021 at 09:38:03PM +0000, Matthew Wilcox wrote: > @@ -891,16 +893,19 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff > _t pos, u64 length) > struct page *page; > int status; > unsigned offset = offset_in_page(pos); > - unsigned bytes = min_t(u64, PAGE_SIZE - offset, length); > > - status = iomap_write_begin(iter, pos, bytes, &page); > + if (length > UINT_MAX) > + length = UINT_MAX; > + status = iomap_write_begin(iter, pos, length, &page); > if (status) > return status; > + if (length > PAGE_SIZE - offset) > + length = PAGE_SIZE - offset; > > - zero_user(page, offset, bytes); > + zero_user(page, offset, length); > mark_page_accessed(page); > > - return iomap_write_end(iter, pos, bytes, bytes, page); > + return iomap_write_end(iter, pos, length, length, page); > } After attempting the merge with Christoph's ill-timed refactoring, I decided that eliding the use of 'bytes' here was the wrong approach, because it very much needs to be put back in for the merge. Here's the merge as I have it: diff --cc fs/iomap/buffered-io.c index f3176cf90351,d1aa0f0e7fd5..40356db3e856 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@@ -888,19 -926,12 +904,23 @@@ static loff_t iomap_zero_iter(struct io return length; do { - unsigned offset = offset_in_page(pos); - size_t bytes = min_t(u64, PAGE_SIZE - offset, length); - struct page *page; - s64 bytes; ++ struct folio *folio; + int status; ++ size_t offset; ++ size_t bytes = min_t(u64, SIZE_MAX, length); + - status = iomap_write_begin(iter, pos, bytes, &page); ++ status = iomap_write_begin(iter, pos, bytes, &folio); + if (status) + return status; + - zero_user(page, offset, bytes); - mark_page_accessed(page); ++ offset = offset_in_folio(folio, pos); ++ if (bytes > folio_size(folio) - offset) ++ bytes = folio_size(folio) - offset; ++ ++ folio_zero_range(folio, offset, bytes); ++ folio_mark_accessed(folio); - bytes = iomap_write_end(iter, pos, bytes, bytes, page); - if (IS_DAX(iter->inode)) - bytes = dax_iomap_zero(pos, length, iomap); - else - bytes = __iomap_zero_iter(iter, pos, length); ++ bytes = iomap_write_end(iter, pos, bytes, bytes, folio); if (bytes < 0) return bytes; I've pushed out a new tag: https://git.infradead.org/users/willy/linux.git/shortlog/refs/tags/iomap-folio-5.17d
On Fri, Dec 10, 2021 at 04:19:54PM +0000, Matthew Wilcox wrote: > After attempting the merge with Christoph's ill-timed refactoring, I did give you a headsup before.. > I decided that eliding the use of 'bytes' here was the wrong approach, > because it very much needs to be put back in for the merge. Is there any good reason to not just delay the iomp_zero_iter folio conversion for now?
On Sun, Dec 12, 2021 at 11:34:54PM -0800, Christoph Hellwig wrote: > On Fri, Dec 10, 2021 at 04:19:54PM +0000, Matthew Wilcox wrote: > > After attempting the merge with Christoph's ill-timed refactoring, > > I did give you a headsup before.. I thought that was going in via Darrick's tree. I had no idea Dan was going to take it. > > I decided that eliding the use of 'bytes' here was the wrong approach, > > because it very much needs to be put back in for the merge. > > Is there any good reason to not just delay the iomp_zero_iter folio > conversion for now? It would hold up about half of the iomap folio conversion (~10 patches). I don't understand what the benefit is of your patch series. Moving filesystems away from being bdev based just doesn't seem interesting to me. Having DAX as an optional feature that some bdevs have seems like a far superior option.
On Thu, Dec 09, 2021 at 09:38:03PM +0000, Matthew Wilcox wrote: > On Mon, Nov 08, 2021 at 04:05:42AM +0000, Matthew Wilcox (Oracle) wrote: > > +++ b/fs/iomap/buffered-io.c > > @@ -881,17 +881,20 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare); > > > > static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length) > > { > > + struct folio *folio; > > struct page *page; > > int status; > > - unsigned offset = offset_in_page(pos); > > - unsigned bytes = min_t(u64, PAGE_SIZE - offset, length); > > + size_t offset, bytes; > > > > - status = iomap_write_begin(iter, pos, bytes, &page); > > + status = iomap_write_begin(iter, pos, length, &page); > > This turned out to be buggy. Darrick and I figured out why his tests > were failing and mine weren't; this only shows up with a 4kB block > size filesystem and I was only testing with 1kB block size filesystems. > (at least on x86; I haven't figured out why it passes with 1kB block size > filesystems, so I'm not sure what would be true on other filesystems). > iomap_write_begin() is not prepared to deal with a length that spans a > page boundary. So I'm replacing this patch with the following patches > (whitespace damaged; pick them up from > https://git.infradead.org/users/willy/linux.git/tag/refs/tags/iomap-folio-5.17c > if you want to compile them): > > commit 412212960b72 > Author: Matthew Wilcox (Oracle) <willy@infradead.org> > Date: Thu Dec 9 15:47:44 2021 -0500 > > iomap: Allow iomap_write_begin() to be called with the full length > > In the future, we want write_begin to know the entire length of the > write so that it can choose to allocate large folios. Pass the full > length in from __iomap_zero_iter() and limit it where necessary. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c > index d67108489148..9270db17c435 100644 > --- a/fs/gfs2/bmap.c > +++ b/fs/gfs2/bmap.c > @@ -968,6 +968,9 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos, > struct gfs2_sbd *sdp = GFS2_SB(inode); > unsigned int blocks; > > + /* gfs2 does not support large folios yet */ > + if (len > PAGE_SIZE) > + len = PAGE_SIZE; This is awkward -- gfs2 doesn't set the mapping flag to indicate that it supports large folios, so it should never be asked to deal with more than a page at a time. Shouldn't iomap_write_begin clamp its len argument to PAGE_SIZE at the start if the mapping doesn't have the large folios flag set? --D > blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits; > return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0); > } > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 8d7a67655b60..67fcd3b9928d 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -632,6 +632,8 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos, > goto out_no_page; > } > folio = page_folio(page); > + if (pos + len > folio_pos(folio) + folio_size(folio)) > + len = folio_pos(folio) + folio_size(folio) - pos; > > if (srcmap->type == IOMAP_INLINE) > status = iomap_write_begin_inline(iter, page); > @@ -891,16 +893,19 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, loff > _t pos, u64 length) > struct page *page; > int status; > unsigned offset = offset_in_page(pos); > - unsigned bytes = min_t(u64, PAGE_SIZE - offset, length); > > - status = iomap_write_begin(iter, pos, bytes, &page); > + if (length > UINT_MAX) > + length = UINT_MAX; > + status = iomap_write_begin(iter, pos, length, &page); > if (status) > return status; > + if (length > PAGE_SIZE - offset) > + length = PAGE_SIZE - offset; > > - zero_user(page, offset, bytes); > + zero_user(page, offset, length); > mark_page_accessed(page); > > - return iomap_write_end(iter, pos, bytes, bytes, page); > + return iomap_write_end(iter, pos, length, length, page); > } > > static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) > > > commit 78c747a1b3a1 > Author: Matthew Wilcox (Oracle) <willy@infradead.org> > Date: Fri Nov 5 14:24:09 2021 -0400 > > iomap: Convert __iomap_zero_iter to use a folio > > The zero iterator can work in folio-sized chunks instead of page-sized > chunks. This will save a lot of page cache lookups if the file is cached > in large folios. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 67fcd3b9928d..bbde6d4f27cd 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -890,20 +890,23 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare); > > static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length) > { > + struct folio *folio; > struct page *page; > int status; > - unsigned offset = offset_in_page(pos); > + size_t offset; > > if (length > UINT_MAX) > length = UINT_MAX; > status = iomap_write_begin(iter, pos, length, &page); > if (status) > return status; > - if (length > PAGE_SIZE - offset) > - length = PAGE_SIZE - offset; > + folio = page_folio(page); > > - zero_user(page, offset, length); > - mark_page_accessed(page); > + offset = offset_in_folio(folio, pos); > + if (length > folio_size(folio) - offset) > + length = folio_size(folio) - offset; > + folio_zero_range(folio, offset, length); > + folio_mark_accessed(folio); > > return iomap_write_end(iter, pos, length, length, page); > } > > > The xfstests that Darrick identified as failing all passed. Running a > full sweep now; then I'll re-run with a 1kB filesystem to be sure that > still passes. Then I'll send another pull request.
On Thu, Dec 16, 2021 at 11:36:14AM -0800, Darrick J. Wong wrote: > > > > + /* gfs2 does not support large folios yet */ > > + if (len > PAGE_SIZE) > > + len = PAGE_SIZE; > > This is awkward -- gfs2 doesn't set the mapping flag to indicate that it > supports large folios, so it should never be asked to deal with more > than a page at a time. Shouldn't iomap_write_begin clamp its len > argument to PAGE_SIZE at the start if the mapping doesn't have the large > folios flag set? You're right, this is awkward. And it's a bit of a beartrap for another filesystem that wants to implement ->prepare_page in the future. diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 9270db17c435..d67108489148 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -968,9 +968,6 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos, struct gfs2_sbd *sdp = GFS2_SB(inode); unsigned int blocks; - /* gfs2 does not support large folios yet */ - if (len > PAGE_SIZE) - len = PAGE_SIZE; blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits; return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0); } diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 1a9e897ee25a..b1ded5204d1c 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -619,6 +619,9 @@ static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos, if (fatal_signal_pending(current)) return -EINTR; + if (!mapping_large_folio_support(iter->inode->i_mapping)) + len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos)); + if (page_ops && page_ops->page_prepare) { status = page_ops->page_prepare(iter->inode, pos, len); if (status)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 64e54981b651..9c61d12028ca 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -881,17 +881,20 @@ EXPORT_SYMBOL_GPL(iomap_file_unshare); static s64 __iomap_zero_iter(struct iomap_iter *iter, loff_t pos, u64 length) { + struct folio *folio; struct page *page; int status; - unsigned offset = offset_in_page(pos); - unsigned bytes = min_t(u64, PAGE_SIZE - offset, length); + size_t offset, bytes; - status = iomap_write_begin(iter, pos, bytes, &page); + status = iomap_write_begin(iter, pos, length, &page); if (status) return status; + folio = page_folio(page); - zero_user(page, offset, bytes); - mark_page_accessed(page); + offset = offset_in_folio(folio, pos); + bytes = min_t(u64, folio_size(folio) - offset, length); + folio_zero_range(folio, offset, bytes); + folio_mark_accessed(folio); return iomap_write_end(iter, pos, bytes, bytes, page); }
The zero iterator can work in folio-sized chunks instead of page-sized chunks. This will save a lot of page cache lookups if the file is cached in multi-page folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- fs/iomap/buffered-io.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)