diff mbox series

[19/26] xfs: don't bother storing merkle tree blocks for zeroed data blocks

Message ID 171444680689.957659.7685497436750551477.stgit@frogsfrogsfrogs (mailing list archive)
State New, archived
Headers show
Series [01/26] xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c | expand

Commit Message

Darrick J. Wong April 30, 2024, 3:29 a.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Now that fsverity tells our merkle tree io functions about what a hash
of a data block full of zeroes looks like, we can use this information
to avoid writing out merkle tree blocks for sparse regions of the file.
For verified gold master images this can save quite a bit of overhead.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
---
 fs/xfs/xfs_fsverity.c |   29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig May 1, 2024, 6:47 a.m. UTC | #1
On Mon, Apr 29, 2024 at 08:29:03PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Now that fsverity tells our merkle tree io functions about what a hash
> of a data block full of zeroes looks like, we can use this information
> to avoid writing out merkle tree blocks for sparse regions of the file.
> For verified gold master images this can save quite a bit of overhead.

Is this something that fsverity should be doing in a generic way?
It feels odd to have XFS behave different from everyone else here,
even if this does feel useful.  Do we also need any hash validation
that no one tampered with the metadata and added a new extent, or
is this out of scope for fsverity?
Darrick J. Wong May 1, 2024, 10:47 p.m. UTC | #2
On Tue, Apr 30, 2024 at 11:47:23PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 29, 2024 at 08:29:03PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Now that fsverity tells our merkle tree io functions about what a hash
> > of a data block full of zeroes looks like, we can use this information
> > to avoid writing out merkle tree blocks for sparse regions of the file.
> > For verified gold master images this can save quite a bit of overhead.
> 
> Is this something that fsverity should be doing in a generic way?

I don't think it's all that useful for ext4/f2fs because they always
write out full merkle tree blocks even if it's the zerohash over and
over again.  Old kernels aren't going to know how to deal with that.

> It feels odd to have XFS behave different from everyone else here,
> even if this does feel useful.  Do we also need any hash validation
> that no one tampered with the metadata and added a new extent, or
> is this out of scope for fsverity?

If they wrote a new extent with nonzero contents, then the validation
will fail, right?

If they added a new unwritten extent (or a written one full of zeroes),
then the file data hasn't changed and validation would still pass,
correct?

--D
Eric Biggers May 2, 2024, 12:01 a.m. UTC | #3
On Wed, May 01, 2024 at 03:47:36PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 30, 2024 at 11:47:23PM -0700, Christoph Hellwig wrote:
> > On Mon, Apr 29, 2024 at 08:29:03PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Now that fsverity tells our merkle tree io functions about what a hash
> > > of a data block full of zeroes looks like, we can use this information
> > > to avoid writing out merkle tree blocks for sparse regions of the file.
> > > For verified gold master images this can save quite a bit of overhead.
> > 
> > Is this something that fsverity should be doing in a generic way?
> 
> I don't think it's all that useful for ext4/f2fs because they always
> write out full merkle tree blocks even if it's the zerohash over and
> over again.  Old kernels aren't going to know how to deal with that.
> 
> > It feels odd to have XFS behave different from everyone else here,
> > even if this does feel useful.  Do we also need any hash validation
> > that no one tampered with the metadata and added a new extent, or
> > is this out of scope for fsverity?
> 
> If they wrote a new extent with nonzero contents, then the validation
> will fail, right?
> 
> If they added a new unwritten extent (or a written one full of zeroes),
> then the file data hasn't changed and validation would still pass,
> correct?

The point of fsverity is to verify that file data is consistent with the
top-level file digest.  It doesn't really matter which type of extent the data
came from, or if the data got synthesized somehow (e.g. zeroes synthesized from
a hole), as long as fsverity still gets invoked to verify the data.  If the data
itself passes verification, then it's good.  The same applies to Merkle tree
blocks which are an intermediate step in the verification.

In the Merkle tree, ext4 and f2fs currently just use the same concept of
sparsity as the file data, i.e. when a block is unmapped, it is filled in with
all zeroes.  As Darrick noticed, this isn't really the right concept of sparsity
for the Merkle tree, as a block full of hashes of zeroed blocks should be used,
not literally a zeroed block.  I think it makes sense to fix this in XFS, as
it's newly adding fsverity support, and this is a filesystem-level
implementation detail.  It would be difficult to fix this in ext4 and f2fs since
it would be an on-disk format upgrade.  (Existing files should not actually have
any sparse Merkle tree blocks, so we probably could redefine what they mean.
But even if so, old kernels would not be able to read the new files.)

- Eric
Darrick J. Wong May 8, 2024, 8:26 p.m. UTC | #4
On Thu, May 02, 2024 at 12:01:32AM +0000, Eric Biggers wrote:
> On Wed, May 01, 2024 at 03:47:36PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 30, 2024 at 11:47:23PM -0700, Christoph Hellwig wrote:
> > > On Mon, Apr 29, 2024 at 08:29:03PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Now that fsverity tells our merkle tree io functions about what a hash
> > > > of a data block full of zeroes looks like, we can use this information
> > > > to avoid writing out merkle tree blocks for sparse regions of the file.
> > > > For verified gold master images this can save quite a bit of overhead.
> > > 
> > > Is this something that fsverity should be doing in a generic way?
> > 
> > I don't think it's all that useful for ext4/f2fs because they always
> > write out full merkle tree blocks even if it's the zerohash over and
> > over again.  Old kernels aren't going to know how to deal with that.
> > 
> > > It feels odd to have XFS behave different from everyone else here,
> > > even if this does feel useful.  Do we also need any hash validation
> > > that no one tampered with the metadata and added a new extent, or
> > > is this out of scope for fsverity?
> > 
> > If they wrote a new extent with nonzero contents, then the validation
> > will fail, right?
> > 
> > If they added a new unwritten extent (or a written one full of zeroes),
> > then the file data hasn't changed and validation would still pass,
> > correct?
> 
> The point of fsverity is to verify that file data is consistent with the
> top-level file digest.  It doesn't really matter which type of extent the data
> came from, or if the data got synthesized somehow (e.g. zeroes synthesized from
> a hole), as long as fsverity still gets invoked to verify the data.  If the data
> itself passes verification, then it's good.  The same applies to Merkle tree
> blocks which are an intermediate step in the verification.

<nod>

> In the Merkle tree, ext4 and f2fs currently just use the same concept of
> sparsity as the file data, i.e. when a block is unmapped, it is filled in with
> all zeroes.  As Darrick noticed, this isn't really the right concept of sparsity
> for the Merkle tree, as a block full of hashes of zeroed blocks should be used,
> not literally a zeroed block.  I think it makes sense to fix this in XFS, as
> it's newly adding fsverity support, and this is a filesystem-level
> implementation detail.  It would be difficult to fix this in ext4 and f2fs since
> it would be an on-disk format upgrade.  (Existing files should not actually have
> any sparse Merkle tree blocks, so we probably could redefine what they mean.
> But even if so, old kernels would not be able to read the new files.)

<nod>

--D

> - Eric
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_fsverity.c b/fs/xfs/xfs_fsverity.c
index f6c650e81cb26..e2de99272b7da 100644
--- a/fs/xfs/xfs_fsverity.c
+++ b/fs/xfs/xfs_fsverity.c
@@ -824,6 +824,20 @@  xfs_fsverity_read_merkle(
 	/* Read the block in from disk and try to store it in the cache. */
 	xfs_fsverity_init_merkle_args(ip, &name, block->pos, &args);
 	error = xfs_attr_get(&args);
+	if (error == -ENOATTR) {
+		u8		*p;
+		unsigned int	i;
+
+		/*
+		 * No attribute found.  Synthesize a buffer full of the zero
+		 * digests on the assumption that we elided them at write time.
+		 */
+		for (i = 0, p = new_mk->data;
+		     i < block->size;
+		     i += req->digest_size, p += req->digest_size)
+			memcpy(p, req->zero_digest, req->digest_size);
+		error = 0;
+	}
 	if (error)
 		goto out_new_mk;
 
@@ -875,10 +889,23 @@  xfs_fsverity_write_merkle(
 		.valuelen		= size,
 	};
 	const char			*p;
+	unsigned int			i;
+
+	/*
+	 * If this is a block full of hashes of zeroed blocks, don't bother
+	 * storing the block.  We can synthesize them later.
+	 */
+	for (i = 0, p = buf;
+	     i < size;
+	     i += req->digest_size, p += req->digest_size)
+		if (memcmp(p, req->zero_digest, req->digest_size))
+			break;
+	if (i == size)
+		return 0;
 
 	/*
 	 * Don't store trailing zeroes, except for the first byte, which we
-	 * need to avoid ENODATA errors in the merkle read path.
+	 * need to avoid confusion with elided blocks.
 	 */
 	p = buf + size - 1;
 	while (p >= (const char *)buf && *p == 0)