Message ID | 20201029132325.1663790-4-bfoster@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | misc iomap/xfs writeback fixes | expand |
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
On Thu, Oct 29, 2020 at 09:23:25AM -0400, Brian Foster wrote: > The iomap writepage error handling logic is a mash of old and > slightly broken XFS writepage logic. When keepwrite writeback state > tracking was introduced in XFS in commit 0d085a529b42 ("xfs: ensure > WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an > additional cluster writeback context that scanned ahead of > ->writepage() to process dirty pages over the current ->writepage() > extent mapping. This context expected a dirty page and required > retention of the TOWRITE tag on partial page processing so the > higher level writeback context would revisit the page (in contrast > to ->writepage(), which passes a page with the dirty bit already > cleared). > > The cluster writeback mechanism was eventually removed and some of > the error handling logic folded into the primary writeback path in > commit 150d5be09ce4 ("xfs: remove xfs_cancel_ioend"). This patch > accidentally conflated the two contexts by using the keepwrite logic > in ->writepage() without accounting for the fact that the page is > not dirty. Further, the keepwrite logic has no practical effect on > the core ->writepage() caller (write_cache_pages()) because it never > revisits a page in the current function invocation. > > Technically, the page should be redirtied for the keepwrite logic to > have any effect. Otherwise, write_cache_pages() may find the tagged > page but will skip it since it is clean. Even if the page was > redirtied, however, there is still no practical effect to keepwrite > since write_cache_pages() does not wrap around within a single > invocation of the function. Therefore, the dirty page would simply > end up retagged on the next writeback sequence over the associated > range. > > All that being said, none of this really matters because redirtying > a partially processed page introduces a potential infinite redirty > -> writeback failure loop that deviates from the current design > principle of clearing the dirty state on writepage failure to avoid > building up too much dirty, unreclaimable memory on the system. > Therefore, drop the spurious keepwrite usage and dirty state > clearing logic from iomap_writepage_map(), treat the partially > processed page the same as a fully processed page, and let the > imminent ioend failure clean up the writeback state. ...and run away before ext4 tries to port itself to buffered iomap, since it's the only other user of keepwrite. Not sure why it ends up in a state where it's doing writeback to a hole(?!) > Signed-off-by: Brian Foster <bfoster@redhat.com> Anyway this seems sensible to me... Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > --- > fs/iomap/buffered-io.c | 15 ++------------- > 1 file changed, 2 insertions(+), 13 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index d1f04eabc7e4..e3a4568f6c2e 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1404,6 +1404,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, > WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list)); > WARN_ON_ONCE(!PageLocked(page)); > WARN_ON_ONCE(PageWriteback(page)); > + WARN_ON_ONCE(PageDirty(page)); > > /* > * We cannot cancel the ioend directly here on error. We may have > @@ -1425,21 +1426,9 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, > unlock_page(page); > goto done; > } > - > - /* > - * If the page was not fully cleaned, we need to ensure that the > - * higher layers come back to it correctly. That means we need > - * to keep the page dirty, and for WB_SYNC_ALL writeback we need > - * to ensure the PAGECACHE_TAG_TOWRITE index mark is not removed > - * so another attempt to write this page in this writeback sweep > - * will be made. > - */ > - set_page_writeback_keepwrite(page); > - } else { > - clear_page_dirty_for_io(page); > - set_page_writeback(page); > } > > + set_page_writeback(page); > unlock_page(page); > > /* > -- > 2.25.4 >
On 10/29/20 6:23 AM, Brian Foster wrote: > The iomap writepage error handling logic is a mash of old and > slightly broken XFS writepage logic. When keepwrite writeback state > tracking was introduced in XFS in commit 0d085a529b42 ("xfs: ensure > WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an > additional cluster writeback context that scanned ahead of > ->writepage() to process dirty pages over the current ->writepage() > extent mapping. This context expected a dirty page and required > retention of the TOWRITE tag on partial page processing so the > higher level writeback context would revisit the page (in contrast > to ->writepage(), which passes a page with the dirty bit already > cleared). > > The cluster writeback mechanism was eventually removed and some of > the error handling logic folded into the primary writeback path in > commit 150d5be09ce4 ("xfs: remove xfs_cancel_ioend"). This patch > accidentally conflated the two contexts by using the keepwrite logic > in ->writepage() without accounting for the fact that the page is > not dirty. Further, the keepwrite logic has no practical effect on > the core ->writepage() caller (write_cache_pages()) because it never > revisits a page in the current function invocation. > > Technically, the page should be redirtied for the keepwrite logic to > have any effect. Otherwise, write_cache_pages() may find the tagged > page but will skip it since it is clean. Even if the page was > redirtied, however, there is still no practical effect to keepwrite > since write_cache_pages() does not wrap around within a single > invocation of the function. Therefore, the dirty page would simply > end up retagged on the next writeback sequence over the associated > range. > > All that being said, none of this really matters because redirtying > a partially processed page introduces a potential infinite redirty > -> writeback failure loop that deviates from the current design > principle of clearing the dirty state on writepage failure to avoid > building up too much dirty, unreclaimable memory on the system. > Therefore, drop the spurious keepwrite usage and dirty state > clearing logic from iomap_writepage_map(), treat the partially > processed page the same as a fully processed page, and let the > imminent ioend failure clean up the writeback state. > Ok, thanks for all the explaining. Makes sense :-) Reviewed-by: Allison Henderson <allison.henderson@oracle.com> > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > fs/iomap/buffered-io.c | 15 ++------------- > 1 file changed, 2 insertions(+), 13 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index d1f04eabc7e4..e3a4568f6c2e 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1404,6 +1404,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, > WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list)); > WARN_ON_ONCE(!PageLocked(page)); > WARN_ON_ONCE(PageWriteback(page)); > + WARN_ON_ONCE(PageDirty(page)); > > /* > * We cannot cancel the ioend directly here on error. We may have > @@ -1425,21 +1426,9 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, > unlock_page(page); > goto done; > } > - > - /* > - * If the page was not fully cleaned, we need to ensure that the > - * higher layers come back to it correctly. That means we need > - * to keep the page dirty, and for WB_SYNC_ALL writeback we need > - * to ensure the PAGECACHE_TAG_TOWRITE index mark is not removed > - * so another attempt to write this page in this writeback sweep > - * will be made. > - */ > - set_page_writeback_keepwrite(page); > - } else { > - clear_page_dirty_for_io(page); > - set_page_writeback(page); > } > > + set_page_writeback(page); > unlock_page(page); > > /* >
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index d1f04eabc7e4..e3a4568f6c2e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1404,6 +1404,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list)); WARN_ON_ONCE(!PageLocked(page)); WARN_ON_ONCE(PageWriteback(page)); + WARN_ON_ONCE(PageDirty(page)); /* * We cannot cancel the ioend directly here on error. We may have @@ -1425,21 +1426,9 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, unlock_page(page); goto done; } - - /* - * If the page was not fully cleaned, we need to ensure that the - * higher layers come back to it correctly. That means we need - * to keep the page dirty, and for WB_SYNC_ALL writeback we need - * to ensure the PAGECACHE_TAG_TOWRITE index mark is not removed - * so another attempt to write this page in this writeback sweep - * will be made. - */ - set_page_writeback_keepwrite(page); - } else { - clear_page_dirty_for_io(page); - set_page_writeback(page); } + set_page_writeback(page); unlock_page(page); /*
The iomap writepage error handling logic is a mash of old and slightly broken XFS writepage logic. When keepwrite writeback state tracking was introduced in XFS in commit 0d085a529b42 ("xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an additional cluster writeback context that scanned ahead of ->writepage() to process dirty pages over the current ->writepage() extent mapping. This context expected a dirty page and required retention of the TOWRITE tag on partial page processing so the higher level writeback context would revisit the page (in contrast to ->writepage(), which passes a page with the dirty bit already cleared). The cluster writeback mechanism was eventually removed and some of the error handling logic folded into the primary writeback path in commit 150d5be09ce4 ("xfs: remove xfs_cancel_ioend"). This patch accidentally conflated the two contexts by using the keepwrite logic in ->writepage() without accounting for the fact that the page is not dirty. Further, the keepwrite logic has no practical effect on the core ->writepage() caller (write_cache_pages()) because it never revisits a page in the current function invocation. Technically, the page should be redirtied for the keepwrite logic to have any effect. Otherwise, write_cache_pages() may find the tagged page but will skip it since it is clean. Even if the page was redirtied, however, there is still no practical effect to keepwrite since write_cache_pages() does not wrap around within a single invocation of the function. Therefore, the dirty page would simply end up retagged on the next writeback sequence over the associated range. All that being said, none of this really matters because redirtying a partially processed page introduces a potential infinite redirty -> writeback failure loop that deviates from the current design principle of clearing the dirty state on writepage failure to avoid building up too much dirty, unreclaimable memory on the system. Therefore, drop the spurious keepwrite usage and dirty state clearing logic from iomap_writepage_map(), treat the partially processed page the same as a fully processed page, and let the imminent ioend failure clean up the writeback state. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/iomap/buffered-io.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-)