xfs: fix broken truncate pre-size update flushing

The pre-size update flush logic in xfs_setattr_size() has become
warped enough that it's difficult to surmise how it is actually
supposed to work. The original purpose of this flush is to mitigate
the "NULL file" problem when a truncate logs a size update before
preceding writes have been flushed and the fs happens to crash. This
code has seen several incremental changes since then that alter
behavior in potentially unexpected ways.

For context, the first in line is commit 5885ebda878b ("xfs: ensure
truncate forces zeroed blocks to disk"). This introduced the zeroing
check specifically for extending truncates and seems straightforward
enough of a change to accomplish that correctly.

Next, commit 68a9f5e7007c ("xfs: implement iomap based buffered
write path") switched over to iomap and introduced did_zeroing for
the truncate down case. This didn't change the flush range offsets,
however, which means partial post-eof zeroing on a truncate down may
only flush if the on disk inode size hadn't been updated to reflect
the in-core size at the start of the truncate.

Sometime after that, commit 350976ae2187 ("xfs: truncate pagecache
before writeback in xfs_setattr_size()") reordered the flush to
after the pagecache truncate to prevent a stale data exposure due to
iomap not zeroing properly. This failed to account for the fact that
pagecache truncate doesn't mark pages dirty and thus leaves the
filesystem responsible for on-disk consistency. Therefore, post-eof
data exposure was still possible if a dirty page was cleaned before
pagecache truncate. This also introduced an off by one issue for the
newsize == oldsize scenario which causes the flush to submit the EOF
page for I/O, but not actually wait on it if the offsets align to
the same page.

Finally, commit 869ae85dae64 ("xfs: flush new eof page on truncate
to avoid post-eof corruption") came along to address the
aforementioned stale data exposure race. This fails to account for
the same scenario on extending truncates, for one, but can also work
against the NULL file detection logic the flush was introduced to
mitigate the first place. This is because selectively flushing the
EOF block can update on-disk size before any preceding dirty data
may have been written back.

Since it is confusing enough to assess intent of the current code
and the various ways it might or might not be broken, this patch
just assumes we want to flush any combination of block zeroing or
previous I/O patterns deemed susceptible to the NULL file problem,
and then tries to do that correctly. Note that the EOF block flush
cannot be removed without reintroducing the data exposure race, but
that problem is mitigated by a separate patch that moves the flush
out of truncate and into iomap processing callbacks such that it is
no longer unconditional.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_iops.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

Message ID	20221128173945.3953659-1-bfoster@redhat.com (mailing list archive)
State	Deferred, archived
Headers	show Return-Path: <linux-xfs-owner@kernel.org> From: Brian Foster <bfoster@redhat.com> To: linux-xfs@vger.kernel.org Subject: [PATCH] xfs: fix broken truncate pre-size update flushing Date: Mon, 28 Nov 2022 12:39:45 -0500 Message-Id: <20221128173945.3953659-1-bfoster@redhat.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit Precedence: bulk
Series	xfs: fix broken truncate pre-size update flushing \| expand xfs: fix broken truncate pre-size update flushing

xfs: fix broken truncate pre-size update flushing

Commit Message

Patch