mbox series

[v2,0/2] Add file-system authentication to BTRFS

Message ID 20200428105859.4719-1-jth@kernel.org (mailing list archive)
Headers show
Series Add file-system authentication to BTRFS | expand

Message

Johannes Thumshirn April 28, 2020, 10:58 a.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

This series adds file-system authentication to BTRFS. 

Unlike other verified file-system techniques like fs-verity the
authenticated version of BTRFS does not need extra meta-data on disk.

This works because in BTRFS every on-disk block has a checksum, for meta-data
the checksum is in the header of each meta-data item. For data blocks, a
separate checksum tree exists, which holds the checksums for each block.

Currently BRTFS supports CRC32C, XXHASH64, SHA256 and Blake2b for checksumming
these blocks. This series adds a new checksum algorithm, HMAC(SHA-256), which
does need an authentication key. When no, or an incoreect authentication key
is supplied no valid checksum can be generated and a read, fsck or scrub
operation would detect invalid or tampered blocks once the file-system is
mounted again with the correct key. 

Getting the key inside the kernel is out of scope of this implementation, the
file-system driver assumes the key is already in the kernel's keyring at mount
time.

There was interest in also using a HMAC version of Blake2b from the community,
but as none of the crypto libraries used by user-space BTRFS tools as a
backend does currently implement a HMAC version with Blake2b, it is not (yet)
included.

I have CCed Eric Biggers and Richard Weinberger in the submission, as they
previously have worked on filesystem authentication and I hope we can get
input from them as well.

Example usage:
Create a file-system with authentication key 0123456
mkfs.btrfs --csum hmac-sha256 --auth-key 0123456 /dev/disk

Add the key to the kernel's keyring as keyid 'btrfs:foo'
keyctl add logon btrfs:foo 0123456 @u

Mount the fs using the 'btrfs:foo' key
mount -t btrfs -o auth_key=btrfs:foo /dev/disk /mnt/point

Note, this is a re-base of the work I did when I was still at SUSE, hence the
S-o-b being my SUSE address, while the Author being with my WDC address (to
not generate bouncing mails).

Changes since v1:
- None, only rebased the series

Johannes Thumshirn (2):
  btrfs: add authentication support
  btrfs: rename btrfs_parse_device_options back to
    btrfs_parse_early_options

 fs/btrfs/ctree.c                |  3 ++-
 fs/btrfs/ctree.h                |  2 ++
 fs/btrfs/disk-io.c              | 53 ++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/super.c                | 31 +++++++++++++++++++-----
 include/uapi/linux/btrfs_tree.h |  1 +
 5 files changed, 82 insertions(+), 8 deletions(-)

Comments

Eric Biggers May 1, 2020, 6:03 a.m. UTC | #1
On Tue, Apr 28, 2020 at 12:58:57PM +0200, Johannes Thumshirn wrote:
> 
> There was interest in also using a HMAC version of Blake2b from the community,
> but as none of the crypto libraries used by user-space BTRFS tools as a
> backend does currently implement a HMAC version with Blake2b, it is not (yet)
> included.

Note that BLAKE2b optionally takes a key, so using HMAC with it is unnecessary.

And the kernel crypto API's implementation of BLAKE2b already supports this.
I.e. you can call crypto_shash_setkey() directly on "blake2b-256".

- Eric
Jason A. Donenfeld May 1, 2020, 9:26 p.m. UTC | #2
Hi Johannes,

On Tue, Apr 28, 2020 at 12:58:57PM +0200, Johannes Thumshirn wrote:
> Currently BRTFS supports CRC32C, XXHASH64, SHA256 and Blake2b for checksumming
> these blocks. This series adds a new checksum algorithm, HMAC(SHA-256), which
> does need an authentication key. When no, or an incoreect authentication key
> is supplied no valid checksum can be generated and a read, fsck or scrub
> operation would detect invalid or tampered blocks once the file-system is
> mounted again with the correct key. 

In case you're interested, Blake2b and Blake2s both have "keyed" modes,
which are more efficient than HMAC and achieve basically the same thing
-- they provide a PRF/MAC. There are normal crypto API interfaces for
these, and there's also an easy library interface:

#include <crypto/blake2s.h>
blake2s(output_mac, input_data, secret_key,
        output_mac_length, input_data_length, secret_key_length);

You might find that the performance of Blake2b and Blake2s is better
than HMAC-SHA2-256.

But more generally, I'm wondering about the general design and what
properties you're trying to provide. Is the block counter being hashed
in to prevent rearranging? Are there generation counters to prevent
replay/rollback?

Also, I'm wondering if this is the kind of feature you'd consider
pairing with a higher speed AEAD, and maybe in a way that would
integrate with the existing fscrypt tooling, without the need to manage
two sets of keys. Ever looked at bcachefs' design for this?
https://bcachefs.org/Encryption/

Either way, I'm happy to learn that btrfs is a filesystem with some
space baked in for authentication tags.

Jason
Johannes Thumshirn May 4, 2020, 8:39 a.m. UTC | #3
On 01/05/2020 08:03, Eric Biggers wrote:
> On Tue, Apr 28, 2020 at 12:58:57PM +0200, Johannes Thumshirn wrote:
>>
>> There was interest in also using a HMAC version of Blake2b from the community,
>> but as none of the crypto libraries used by user-space BTRFS tools as a
>> backend does currently implement a HMAC version with Blake2b, it is not (yet)
>> included.
> 
> Note that BLAKE2b optionally takes a key, so using HMAC with it is unnecessary.
> 
> And the kernel crypto API's implementation of BLAKE2b already supports this.
> I.e. you can call crypto_shash_setkey() directly on "blake2b-256".

Oh thanks for letting me know.
David Sterba May 5, 2020, 11:16 p.m. UTC | #4
On Thu, Apr 30, 2020 at 11:03:36PM -0700, Eric Biggers wrote:
> On Tue, Apr 28, 2020 at 12:58:57PM +0200, Johannes Thumshirn wrote:
> > There was interest in also using a HMAC version of Blake2b from the community,
> > but as none of the crypto libraries used by user-space BTRFS tools as a
> > backend does currently implement a HMAC version with Blake2b, it is not (yet)
> > included.
> 
> Note that BLAKE2b optionally takes a key, so using HMAC with it is unnecessary.
> 
> And the kernel crypto API's implementation of BLAKE2b already supports this.
> I.e. you can call crypto_shash_setkey() directly on "blake2b-256".

The idea behind using HMAC + checksum and not the built-in blake2b keyed
hash was to make the definitions unified and use the established crypto
primitives without algorithm-specific tweaks.

But you're right that using "blake2b-256" + setkey achieves the same, I
haven't realized that.
David Sterba May 5, 2020, 11:38 p.m. UTC | #5
On Fri, May 01, 2020 at 03:26:48PM -0600, Jason A. Donenfeld wrote:
> > Currently BRTFS supports CRC32C, XXHASH64, SHA256 and Blake2b for checksumming
> > these blocks. This series adds a new checksum algorithm, HMAC(SHA-256), which
> > does need an authentication key. When no, or an incoreect authentication key
> > is supplied no valid checksum can be generated and a read, fsck or scrub
> > operation would detect invalid or tampered blocks once the file-system is
> > mounted again with the correct key. 
> 
> In case you're interested, Blake2b and Blake2s both have "keyed" modes,
> which are more efficient than HMAC and achieve basically the same thing
> -- they provide a PRF/MAC. There are normal crypto API interfaces for
> these, and there's also an easy library interface:
> 
> #include <crypto/blake2s.h>
> blake2s(output_mac, input_data, secret_key,
>         output_mac_length, input_data_length, secret_key_length);
> 
> You might find that the performance of Blake2b and Blake2s is better
> than HMAC-SHA2-256.

As Eric also pointed out, the keyed blake2b is suitable.

> But more generally, I'm wondering about the general design and what
> properties you're trying to provide. Is the block counter being hashed
> in to prevent rearranging? Are there generation counters to prevent
> replay/rollback?

Hopefully the details will be covered in the next iteration, but let me
to give you at least some information.

The metadata blocks contain a logical block address and generation.
(https://elixir.bootlin.com/linux/latest/source/fs/btrfs/ctree.h#L161)
The generation is incremented by one each time the superblock (and thus
the transaction epoch) is written. The block number changes when it is
COWed. The metadata block (sizes are 4k up to 64k) is checksummed from
the 'fsid' member to the end of the block, ie. including the generation
and block address.

The mapping of physical blocks on devices and the logical addreses is
stored in a separate b-tree, as dedicated items in metadata blocks, so
there's inherent checksumming of that information.

The data blocks themselves have a detached checksum stored in checksum
tree, again inside items in metadata blocks.

The last remaining part is the superblock and that is being discussed in
https://lore.kernel.org/linux-btrfs/20200505221448.GW18421@twin.jikos.cz/