Message ID | 20210628172727.1894503-1-agruenba@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | iomap: small block problems | expand |
On Mon, Jun 28, 2021 at 07:27:25PM +0200, Andreas Gruenbacher wrote: > (1) In iomap_readpage_actor, an iomap_page is attached to the page even > for inline inodes. This is unnecessary because inline inodes don't need > iomap_page objects. That alone wouldn't cause any real issues, but when > iomap_read_inline_data copies the inline data into the page, it sets the > PageUptodate flag without setting iop->uptodate, causing an > inconsistency between the two. This will trigger a WARN_ON in > iomap_page_release. The fix should be not to allocate iomap_page > objects when reading from inline inodes (patch 1). I don't have a problem with this patch. > (2) When un-inlining an inode, we must allocate a page with an attached > iomap_page object (iomap_page_create) and initialize the iop->uptodate > bitmap (iomap_set_range_uptodate). We can't currently do that because > iomap_page_create and iomap_set_range_uptodate are not exported. That > could be fixed by exporting those functions, or by implementing an > additional helper as in patch 2. Which of the two would you prefer? Not hugely happy with either of these options, tbh. I'd rather we apply a patch akin to this one (plucked from the folio tree), so won't apply: @@ -1305,7 +1311,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, struct writeback_control *wbc, struct inode *inode, struct folio *folio, loff_t end_pos) { - struct iomap_page *iop = to_iomap_page(folio); + struct iomap_page *iop = iomap_page_create(inode, folio); struct iomap_ioend *ioend, *next; unsigned len = i_blocksize(inode); unsigned nblocks = i_blocks_per_folio(inode, folio); @@ -1313,7 +1319,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, int error = 0, count = 0, i; LIST_HEAD(submit_list); - WARN_ON_ONCE(nblocks > 1 && !iop); WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0); /* so permit pages without an iop to enter writeback and create an iop *then*. Would that solve your problem? > (3) We're not yet using iomap_page_mkwrite, so iomap_page objects don't > get created on .page_mkwrite, either. Part of the reason is that > iomap_page_mkwrite locks the page and then calls into the filesystem for > uninlining and for allocating backing blocks. This conflicts with the > gfs2 locking order: on gfs2, transactions must be started before locking > any pages. We can fix that by calling iomap_page_create from > gfs2_page_mkwrite, or by doing the uninlining and allocations before > calling iomap_page_mkwrite. I've implemented option 2 for now; see > here: I think this might also solve this problem?
On Mon, Jun 28, 2021 at 06:39:09PM +0100, Matthew Wilcox wrote: > Not hugely happy with either of these options, tbh. I'd rather we apply > a patch akin to this one (plucked from the folio tree), so won't apply: > so permit pages without an iop to enter writeback and create an iop > *then*. Would that solve your problem? It is the right thing to do, especially when combined with a feature patch to not bother to create the iomap_page structure on small block size file systems when the extent covers the whole page. > > > (3) We're not yet using iomap_page_mkwrite, so iomap_page objects don't > > get created on .page_mkwrite, either. Part of the reason is that > > iomap_page_mkwrite locks the page and then calls into the filesystem for > > uninlining and for allocating backing blocks. This conflicts with the > > gfs2 locking order: on gfs2, transactions must be started before locking > > any pages. We can fix that by calling iomap_page_create from > > gfs2_page_mkwrite, or by doing the uninlining and allocations before > > calling iomap_page_mkwrite. I've implemented option 2 for now; see > > here: > > I think this might also solve this problem? We'll still need to create the iomap_page structure for page_mkwrite if there is an extent boundary inside the page.
On Mon, Jun 28, 2021 at 7:40 PM Matthew Wilcox <willy@infradead.org> wrote: > On Mon, Jun 28, 2021 at 07:27:25PM +0200, Andreas Gruenbacher wrote: > > (1) In iomap_readpage_actor, an iomap_page is attached to the page even > > for inline inodes. This is unnecessary because inline inodes don't need > > iomap_page objects. That alone wouldn't cause any real issues, but when > > iomap_read_inline_data copies the inline data into the page, it sets the > > PageUptodate flag without setting iop->uptodate, causing an > > inconsistency between the two. This will trigger a WARN_ON in > > iomap_page_release. The fix should be not to allocate iomap_page > > objects when reading from inline inodes (patch 1). > > I don't have a problem with this patch. > > > (2) When un-inlining an inode, we must allocate a page with an attached > > iomap_page object (iomap_page_create) and initialize the iop->uptodate > > bitmap (iomap_set_range_uptodate). We can't currently do that because > > iomap_page_create and iomap_set_range_uptodate are not exported. That > > could be fixed by exporting those functions, or by implementing an > > additional helper as in patch 2. Which of the two would you prefer? > > Not hugely happy with either of these options, tbh. I'd rather we apply > a patch akin to this one (plucked from the folio tree), so won't apply: > > @@ -1305,7 +1311,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, > struct writeback_control *wbc, struct inode *inode, > struct folio *folio, loff_t end_pos) > { > - struct iomap_page *iop = to_iomap_page(folio); > + struct iomap_page *iop = iomap_page_create(inode, folio); > struct iomap_ioend *ioend, *next; > unsigned len = i_blocksize(inode); > unsigned nblocks = i_blocks_per_folio(inode, folio); > @@ -1313,7 +1319,6 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, > int error = 0, count = 0, i; > LIST_HEAD(submit_list); > > - WARN_ON_ONCE(nblocks > 1 && !iop); > WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0); > > /* > > so permit pages without an iop to enter writeback and create an iop > *then*. Would that solve your problem? It probably would. Let me do some testing based on that. > > (3) We're not yet using iomap_page_mkwrite, so iomap_page objects don't > > get created on .page_mkwrite, either. Part of the reason is that > > iomap_page_mkwrite locks the page and then calls into the filesystem for > > uninlining and for allocating backing blocks. This conflicts with the > > gfs2 locking order: on gfs2, transactions must be started before locking > > any pages. We can fix that by calling iomap_page_create from > > gfs2_page_mkwrite, or by doing the uninlining and allocations before > > calling iomap_page_mkwrite. I've implemented option 2 for now; see > > here: > > I think this might also solve this problem? Probably yes. Thanks, Andreas
On Mon, Jun 28, 2021 at 8:07 PM Christoph Hellwig <hch@infradead.org> wrote: > On Mon, Jun 28, 2021 at 06:39:09PM +0100, Matthew Wilcox wrote: > > Not hugely happy with either of these options, tbh. I'd rather we apply > > a patch akin to this one (plucked from the folio tree), so won't apply: > > > so permit pages without an iop to enter writeback and create an iop > > *then*. Would that solve your problem? > > It is the right thing to do, especially when combined with a feature > patch to not bother to create the iomap_page structure on small > block size file systems when the extent covers the whole page. > > > > > > (3) We're not yet using iomap_page_mkwrite, so iomap_page objects don't > > > get created on .page_mkwrite, either. Part of the reason is that > > > iomap_page_mkwrite locks the page and then calls into the filesystem for > > > uninlining and for allocating backing blocks. This conflicts with the > > > gfs2 locking order: on gfs2, transactions must be started before locking > > > any pages. We can fix that by calling iomap_page_create from > > > gfs2_page_mkwrite, or by doing the uninlining and allocations before > > > calling iomap_page_mkwrite. I've implemented option 2 for now; see > > > here: > > > > I think this might also solve this problem? > > We'll still need to create the iomap_page structure for page_mkwrite > if there is an extent boundary inside the page. Yes, but the iop wouldn't need to be allocated in page_mkwrite; that would be taken care of by iomap_writepage / iomap_writepages as in the patch suggested by Matthew, right? Thanks, Andreas
On Mon, Jun 28, 2021 at 06:47:58PM +0100, Christoph Hellwig wrote: > On Mon, Jun 28, 2021 at 06:39:09PM +0100, Matthew Wilcox wrote: > > Not hugely happy with either of these options, tbh. I'd rather we apply > > a patch akin to this one (plucked from the folio tree), so won't apply: > > > so permit pages without an iop to enter writeback and create an iop > > *then*. Would that solve your problem? > > It is the right thing to do, especially when combined with a feature > patch to not bother to create the iomap_page structure on small > block size file systems when the extent covers the whole page. We don't know the extent layout at the point where *this* patch creates iomap_pages during writeback. I imagine we can delay creating one until we find out what our destination layout will be?
On Mon, Jun 28, 2021 at 10:59:55PM +0100, Matthew Wilcox wrote: > > > so permit pages without an iop to enter writeback and create an iop > > > *then*. Would that solve your problem? > > > > It is the right thing to do, especially when combined with a feature > > patch to not bother to create the iomap_page structure on small > > block size file systems when the extent covers the whole page. > > We don't know the extent layout at the point where *this* patch creates > iomap_pages during writeback. I imagine we can delay creating one until > we find out what our destination layout will be? Hmm. Actually ->page_mkwrite is always is always called on an uptodate page and we even assert that. I should have remembered the whole page fault path better. So yeah, I think we should take patch 1 from Andreas, then a non-folio version of your patch as a start. The next steps then would be in approximate order: 1. remove the iomap_page_create in iomap_page_mkwrite_actor as it clearly is not needed at that point 2. don't bother to create an iomap_page in iomap_readpage_actor when the iomap spans the whole page 3. don't create the iomap_page in __iomap_write_begin when the page is marked uptodate or the write covers the whole page delaying the creation further in iomap_writepage_map will be harder as the loop around iomap_add_to_ioend is still fundamentally block based.
On Tue, Jun 29, 2021 at 06:29:48AM +0100, Christoph Hellwig wrote: > Hmm. Actually ->page_mkwrite is always is always called on an uptodate > page and we even assert that. I should have remembered the whole page > fault path better. > > So yeah, I think we should take patch 1 from Andreas, then a non-folio > version of your patch as a start. The next steps then would be in > approximate order: > > 1. remove the iomap_page_create in iomap_page_mkwrite_actor as it > clearly is not needed at that point > 2. don't bother to create an iomap_page in iomap_readpage_actor when > the iomap spans the whole page > 3. don't create the iomap_page in __iomap_write_begin when the > page is marked uptodate or the write covers the whole page Further thoughts for a better series: 1. create iomap_page if needed in iomap_writepage_map 2. do not create the iomap_page at all in iomap_readpage_actor. ->readahead is always called on newly allocated pages, and ->readpage either on a clean !uptodate page or on one that has seen a write leading to a partial uptodate state. That is for the case that cares about the iomap_page it is present already 3. don't create the iomap_page in iomap_page_mkwrite_actor I think this is the simple initial series that should solve Andreas' problem. Then we can look into optimizing __iomap_write_begin and iomap_writepage_map further as needed.
Below is a version of your patch on top of v5.13 which has passed some
local testing here.
Thanks,
Andreas
--
iomap: Permit pages without an iop to enter writeback
Permit pages without an iop to enter writeback and create an iop *then*. This
allows filesystems to mark pages dirty without having to worry about how the
iop block tracking is implemented.
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/iomap/buffered-io.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 03537ecb2a94..6330dabc451e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1336,14 +1336,13 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc,
struct writeback_control *wbc, struct inode *inode,
struct page *page, u64 end_offset)
{
- struct iomap_page *iop = to_iomap_page(page);
+ struct iomap_page *iop = iomap_page_create(inode, page);
struct iomap_ioend *ioend, *next;
unsigned len = i_blocksize(inode);
u64 file_offset; /* file offset of page */
int error = 0, count = 0, i;
LIST_HEAD(submit_list);
- WARN_ON_ONCE(i_blocks_per_page(inode, page) > 1 && !iop);
WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0);
/*
Darrick, On Tue, Jun 29, 2021 at 7:47 AM Christoph Hellwig <hch@infradead.org> wrote: > On Mon, Jun 28, 2021 at 10:59:55PM +0100, Matthew Wilcox wrote: > > > > so permit pages without an iop to enter writeback and create an iop > > > > *then*. Would that solve your problem? > > > > > > It is the right thing to do, especially when combined with a feature > > > patch to not bother to create the iomap_page structure on small > > > block size file systems when the extent covers the whole page. > > > > We don't know the extent layout at the point where *this* patch creates > > iomap_pages during writeback. I imagine we can delay creating one until > > we find out what our destination layout will be? > > Hmm. Actually ->page_mkwrite is always is always called on an uptodate > page and we even assert that. I should have remembered the whole page > fault path better. > > So yeah, I think we should take patch 1 from Andreas, then a non-folio > version of your patch as a start. will you pick up those two patches and push them to Linus? They both seem pretty safe. Thanks, Andreas
On Tue, Jun 29, 2021 at 11:12:39AM +0200, Andreas Gruenbacher wrote: > Below is a version of your patch on top of v5.13 which has passed some > local testing here. > > Thanks, > Andreas > > -- > > iomap: Permit pages without an iop to enter writeback > > Permit pages without an iop to enter writeback and create an iop *then*. This > allows filesystems to mark pages dirty without having to worry about how the > iop block tracking is implemented. How about ... Create an iop in the writeback path if one doesn't exist. This allows us to avoid creating the iop in some cases. The only current case we do that for is pages with inline data, but it can be extended to pages which are entirely within an extent. It also allows for an iop to be removed from pages in the future (eg page split). > Suggested-by: Matthew Wilcox <willy@infradead.org> > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
On Wed, Jun 30, 2021 at 4:09 PM Matthew Wilcox <willy@infradead.org> wrote: > On Tue, Jun 29, 2021 at 11:12:39AM +0200, Andreas Gruenbacher wrote: > > Below is a version of your patch on top of v5.13 which has passed some > > local testing here. > > > > Thanks, > > Andreas > > > > -- > > > > iomap: Permit pages without an iop to enter writeback > > > > Permit pages without an iop to enter writeback and create an iop *then*. This > > allows filesystems to mark pages dirty without having to worry about how the > > iop block tracking is implemented. > > How about ... > > Create an iop in the writeback path if one doesn't exist. This allows > us to avoid creating the iop in some cases. The only current case we > do that for is pages with inline data, but it can be extended to pages > which are entirely within an extent. It also allows for an iop to be > removed from pages in the future (eg page split). > > > Suggested-by: Matthew Wilcox <willy@infradead.org> > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> > > Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Sure, that works for me. Thanks, Andreas
On Wed, Jun 30, 2021 at 2:29 PM Andreas Gruenbacher <agruenba@redhat.com> wrote: > Darrick, > > will you pick up those two patches and push them to Linus? They both > seem pretty safe. Hello, is there anybody out there? I've put the two patches here with the sign-offs they've received: https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/log/?h=for-next.iomap Thanks, Andreas
On Mon, Jul 05, 2021 at 05:51:22PM +0200, Andreas Gruenbacher wrote: > On Wed, Jun 30, 2021 at 2:29 PM Andreas Gruenbacher <agruenba@redhat.com> wrote: > > Darrick, > > > > will you pick up those two patches and push them to Linus? They both > > seem pretty safe. > > Hello, is there anybody out there? > > I've put the two patches here with the sign-offs they've received: Please send out an updated series likey everyone else.