diff mbox series

[v3,3/3] btrfs: document btrfs authentication

Message ID 20200514092415.5389-4-jth@kernel.org (mailing list archive)
State New, archived
Headers show
Series Add file-system authentication to BTRFS | expand

Commit Message

Johannes Thumshirn May 14, 2020, 9:24 a.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Document the design, guarantees and limitations of an authenticated BTRFS
file-system.

Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 .../filesystems/btrfs-authentication.rst      | 168 ++++++++++++++++++
 1 file changed, 168 insertions(+)
 create mode 100644 Documentation/filesystems/btrfs-authentication.rst

Comments

Jonathan Corbet May 14, 2020, 12:26 p.m. UTC | #1
On Thu, 14 May 2020 11:24:15 +0200
Johannes Thumshirn <jth@kernel.org> wrote:

Quick question...

> Document the design, guarantees and limitations of an authenticated BTRFS
> file-system.
> 
> Cc: Jonathan Corbet <corbet@lwn.net>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  .../filesystems/btrfs-authentication.rst      | 168 ++++++++++++++++++
>  1 file changed, 168 insertions(+)
>  create mode 100644 Documentation/filesystems/btrfs-authentication.rst
> 
> diff --git a/Documentation/filesystems/btrfs-authentication.rst b/Documentation/filesystems/btrfs-authentication.rst
> new file mode 100644
> index 000000000000..f13cab248fc0
> --- /dev/null
> +++ b/Documentation/filesystems/btrfs-authentication.rst
> @@ -0,0 +1,168 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +:orphan:
> +

Why mark this "orphan" rather than just adding it to index.rst so it gets
built with the rest of the docs?

Thanks,

jon
Johannes Thumshirn May 14, 2020, 2:54 p.m. UTC | #2
On 14/05/2020 14:26, Jonathan Corbet wrote:
> On Thu, 14 May 2020 11:24:15 +0200
> Johannes Thumshirn <jth@kernel.org> wrote:
> 
> Quick question...
> 
>> Document the design, guarantees and limitations of an authenticated BTRFS
>> file-system.
>>
>> Cc: Jonathan Corbet <corbet@lwn.net>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>  .../filesystems/btrfs-authentication.rst      | 168 ++++++++++++++++++
>>  1 file changed, 168 insertions(+)
>>  create mode 100644 Documentation/filesystems/btrfs-authentication.rst
>>
>> diff --git a/Documentation/filesystems/btrfs-authentication.rst b/Documentation/filesystems/btrfs-authentication.rst
>> new file mode 100644
>> index 000000000000..f13cab248fc0
>> --- /dev/null
>> +++ b/Documentation/filesystems/btrfs-authentication.rst
>> @@ -0,0 +1,168 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +:orphan:
>> +
> 
> Why mark this "orphan" rather than just adding it to index.rst so it gets
> built with the rest of the docs?
>
I've no idea of rst and the ubifs-authentication.rst which I had open at the
time did have this as well, so I blindly copied it. Thanks for spotting, will
remove in the next iteration.
Richard Weinberger May 14, 2020, 3:14 p.m. UTC | #3
----- Ursprüngliche Mail -----
>> Why mark this "orphan" rather than just adding it to index.rst so it gets
>> built with the rest of the docs?
>>
> I've no idea of rst and the ubifs-authentication.rst which I had open at the
> time did have this as well, so I blindly copied it. Thanks for spotting, will
> remove in the next iteration.

Well, the original ubifs-authentication documentation was written in in markdown
(which is IMHO muss less pain to write), later it was converted to rst by:

commit 09f4c750a8c7d1fc0b7bb3a7aa1de55de897a375
Author: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Date:   Fri Jul 26 09:51:14 2019 -0300

    docs: ubifs-authentication.md: convert to ReST
    
    The documentation standard is ReST and not markdown.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
    Acked-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Jonathan Corbet <corbet@lwn.net>

But I have no idea what this orphan thingy is.

Thanks,
//richard
Jonathan Corbet May 14, 2020, 4 p.m. UTC | #4
On Thu, 14 May 2020 17:14:36 +0200 (CEST)
Richard Weinberger <richard@nod.at> wrote:

> But I have no idea what this orphan thingy is.

It suppresses a warning from Sphinx that the file is not included in the
docs build.  Mauro did that with a lot of his conversions just to make his
life easier at the time, but it's not really something we want going
forward.

jon
Richard Weinberger May 14, 2020, 4:05 p.m. UTC | #5
----- Ursprüngliche Mail -----
> Von: "Jonathan Corbet" <corbet@lwn.net>
> An: "richard" <richard@nod.at>
> CC: "Johannes Thumshirn" <Johannes.Thumshirn@wdc.com>, "Johannes Thumshirn" <jth@kernel.org>, "David Sterba"
> <dsterba@suse.cz>, "linux-fsdevel" <linux-fsdevel@vger.kernel.org>, "linux-btrfs" <linux-btrfs@vger.kernel.org>, "Eric
> Biggers" <ebiggers@google.com>
> Gesendet: Donnerstag, 14. Mai 2020 18:00:18
> Betreff: Re: [PATCH v3 3/3] btrfs: document btrfs authentication

> On Thu, 14 May 2020 17:14:36 +0200 (CEST)
> Richard Weinberger <richard@nod.at> wrote:
> 
>> But I have no idea what this orphan thingy is.
> 
> It suppresses a warning from Sphinx that the file is not included in the
> docs build.  Mauro did that with a lot of his conversions just to make his
> life easier at the time, but it's not really something we want going
> forward.

Ahh, thanks for explaining. :-)

Thanks,
//richard
David Sterba May 24, 2020, 7:55 p.m. UTC | #6
On Thu, May 14, 2020 at 11:24:15AM +0200, Johannes Thumshirn wrote:
> +User-data
> +~~~~~~~~~
> +
> +The checksums for the user or file-data are stored in a separate b-tree, the
> +checksum tree. As this tree in itself is authenticated, only the data stored
> +in it needs to be authenticated. This is done by replacing the checksums
> +stored on disk by the cryptographically secure keyed hash algorithm used for
> +the super-block and other meta-data. So each written file block will get
> +checksummed with the authentication key and without supplying the correct key
> +it is impossible to write data on disk, which can be read back without
> +failing the authentication test. If this test is failed, an I/O error is
> +reported back to the user.

With same key K and same contents of data block B, the keyed hash on two
different filesystems is the same. Ie. there's no per-filesystem salt
(like a UUID) or per-transaction salt (generation, block address).

For metadata the per-transaction salt is inherently there as the hash is
calculated with the header included (containing the increasing
generation) and the filesystem UUID (available via blkid) or chunk tree
UUID (not so easy to user to read).

So there's an obvious discrepancy in the additional data besides the
variable contents of the data and metadata blocks.

The weakness of the data blocks may aid some attacks (I don't have a
concrete suggestion where and how exatly).

Suggested fix is to have a data block "header", with similar contents as
the metadata blocks, eg.

struct btrfs_hash_header {
	u8 fsid[BTRFS_FSID_SIZE];
	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
	__le64 generation;
};

Perhaps also with some extra item for future extensions, set to zeros
for now.
Johannes Thumshirn May 25, 2020, 10:57 a.m. UTC | #7
On 24/05/2020 21:56, David Sterba wrote:
> On Thu, May 14, 2020 at 11:24:15AM +0200, Johannes Thumshirn wrote:
>> +User-data
>> +~~~~~~~~~
>> +
>> +The checksums for the user or file-data are stored in a separate b-tree, the
>> +checksum tree. As this tree in itself is authenticated, only the data stored
>> +in it needs to be authenticated. This is done by replacing the checksums
>> +stored on disk by the cryptographically secure keyed hash algorithm used for
>> +the super-block and other meta-data. So each written file block will get
>> +checksummed with the authentication key and without supplying the correct key
>> +it is impossible to write data on disk, which can be read back without
>> +failing the authentication test. If this test is failed, an I/O error is
>> +reported back to the user.
> 
> With same key K and same contents of data block B, the keyed hash on two
> different filesystems is the same. Ie. there's no per-filesystem salt
> (like a UUID) or per-transaction salt (generation, block address).

Correct.

> 
> For metadata the per-transaction salt is inherently there as the hash is
> calculated with the header included (containing the increasing
> generation) and the filesystem UUID (available via blkid) or chunk tree
> UUID (not so easy to user to read).
> 
> So there's an obvious discrepancy in the additional data besides the
> variable contents of the data and metadata blocks.
> 
> The weakness of the data blocks may aid some attacks (I don't have a
> concrete suggestion where and how exatly).

Yes but wouldn't this also need a hash that is prone to a known plaintext
attack or that has known collisions? But it would probably help in 
brute-forcing the key K of the filesystem. OTOH fsid, generation and the 
chunk-tree UUID can be read in plaintext from the FS as well so this would
only mitigate a rainbow table like attack, wouldn't it?

> 
> Suggested fix is to have a data block "header", with similar contents as
> the metadata blocks, eg.
> 
> struct btrfs_hash_header {
> 	u8 fsid[BTRFS_FSID_SIZE];
> 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
> 	__le64 generation;
> };
> 
> Perhaps also with some extra item for future extensions, set to zeros
> for now.
> 

This addition would be possible, yes. But if we'd add this header to every
checksum in the checksum tree it would be an incompatible on-disk format
change.

We could add this only for authenticated filesystems though, but would this
deviation make sense? I need to think more about it (and actually look at the
code to see how this could be done).
David Sterba May 25, 2020, 11:26 a.m. UTC | #8
On Mon, May 25, 2020 at 10:57:13AM +0000, Johannes Thumshirn wrote:
> On 24/05/2020 21:56, David Sterba wrote:
> > On Thu, May 14, 2020 at 11:24:15AM +0200, Johannes Thumshirn wrote:
> > For metadata the per-transaction salt is inherently there as the hash is
> > calculated with the header included (containing the increasing
> > generation) and the filesystem UUID (available via blkid) or chunk tree
> > UUID (not so easy to user to read).
> > 
> > So there's an obvious discrepancy in the additional data besides the
> > variable contents of the data and metadata blocks.
> > 
> > The weakness of the data blocks may aid some attacks (I don't have a
> > concrete suggestion where and how exatly).
> 
> Yes but wouldn't this also need a hash that is prone to a known plaintext
> attack or that has known collisions? But it would probably help in 
> brute-forcing the key K of the filesystem. OTOH fsid, generation and the 
> chunk-tree UUID can be read in plaintext from the FS as well so this would
> only mitigate a rainbow table like attack, wouldn't it?

The goal here is to make attacks harder at a small cost.

> > Suggested fix is to have a data block "header", with similar contents as
> > the metadata blocks, eg.
> > 
> > struct btrfs_hash_header {
> > 	u8 fsid[BTRFS_FSID_SIZE];
> > 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
> > 	__le64 generation;
> > };
> > 
> > Perhaps also with some extra item for future extensions, set to zeros
> > for now.
> 
> This addition would be possible, yes. But if we'd add this header to every
> checksum in the checksum tree it would be an incompatible on-disk format
> change.

No. It's only in-memory and is built from known pieces of information
exactly to avoid storing it on disk.
Johannes Thumshirn May 25, 2020, 11:44 a.m. UTC | #9
On 25/05/2020 13:27, David Sterba wrote:
>>> Suggested fix is to have a data block "header", with similar contents as
>>> the metadata blocks, eg.
>>>
>>> struct btrfs_hash_header {
>>> 	u8 fsid[BTRFS_FSID_SIZE];
>>> 	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
>>> 	__le64 generation;
>>> };
>>>
>>> Perhaps also with some extra item for future extensions, set to zeros
>>> for now.
>>
>> This addition would be possible, yes. But if we'd add this header to every
>> checksum in the checksum tree it would be an incompatible on-disk format
>> change.
> 
> No. It's only in-memory and is built from known pieces of information
> exactly to avoid storing it on disk.
> 

Ah OK, now I get what you meant. This should then be only for the authenticated 
FS I guess.
diff mbox series

Patch

diff --git a/Documentation/filesystems/btrfs-authentication.rst b/Documentation/filesystems/btrfs-authentication.rst
new file mode 100644
index 000000000000..f13cab248fc0
--- /dev/null
+++ b/Documentation/filesystems/btrfs-authentication.rst
@@ -0,0 +1,168 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+:orphan:
+
+.. BTRFS Authentication
+.. Western Digital or it's affiliates
+.. 2020
+
+Introduction
+============
+
+This document describes an approach get file contents _and_ full meta-data
+authentication for BTRFS.
+
+This is possible because BTRFS uses checksums embedded in its on-disk
+meta-data structures as well as checksums for each individual extent of data.
+The primary intent of these checksums was to detect bit-flips of data at rest.
+But this mechanism can be extended to provide authentication of all on-disk
+data when the checksum algorithm is replaced by a cryptographically secure
+keyed hash.
+
+BTRFS Data Structures
+---------------------
+
+BTRFS utilizes a special copy-on-write b-tree to store all contents of the
+file-system on disk.
+
+Meta-Data
+~~~~~~~~~
+
+On-disk meta-data in BTRFS is stored in copy-on-write b-trees. These b-trees
+are build using two data structures, ``struct btrfs_node`` and ``struct
+btrfs_leaf``. Both of these structures start with a ``struct btrfs_header``.
+This structure has amongst other fields a checksum field, protecting it's
+contents. As the checksum is the first entry in the structure, the whole
+structure is protected by this checksum. The superblock (``struct
+btrfs_super_block``) is the first on-disk structure which is read on mount and
+it as well starts with a checksum field protecting the rest of the structure.
+The super block is also needed to read the addresses of the other file system
+b-trees, so their location on disk is protected by the checksum.
+
+::
+
+          BTRFS Header
+          +------+------+--------+-------+-----------------+----+-------+---------+
+          | csum | fsid | bytenr | flags | chunk_tree_uuid | gen| owner | nritems |
+          +------+------+--------+-------+-----------------+----+-------+---------+
+          BTRFS Node
+          +--------+-------------+-------------+-----+
+          | Header | key pointer | key pointer | ... |
+          +--------+-------------+-------------+-----+
+          BTRFS Leaf
+          +--------+------+------+-----+
+          | Header | item | item | ... |
+          +--------+------+------+-----+
+
+            Figure 1: BTRFS Header, Node and Leaf data structures
+
+User-Data
+~~~~~~~~~
+
+User data in BRTFS is also protected by checksums, but this checksum is not
+stored alongside the data, as it is with meta-data, but stored in a separate
+b-tree, the checksum tree. The leafs of this tree store the checksums of the
+user-data.  The tree nodes and leafs are of ``struct btrfs_node`` or ``struct
+btrfs_leaf``, so integrity of this tree is protected as well.
+
+BTRFS Authentication
+====================
+
+This chapter introduces BTRFS authentication which enables BTRFS to verify
+the authenticity and integrity of metadata and file contents stored on disk.
+
+Threat Model
+------------
+
+BTRFS authentication enables detection of offline data modification. While it
+does not prevent it, it enables (trusted) code to check the integrity and
+authenticity of on-disk file contents and filesystem metadata. This covers
+attacks where file contents are swapped.
+
+BTRFS authentication will not protect against a rollback of full disk
+contents. Ie. an attacker can still dump the disk and restore it at a later
+time without detection. It will also not protect against a rollback of one
+transaction. That means an attacker is able to partially undo changes. This is
+possible, because BTRFS does not immediately overwrite obsolete versions of
+its meta-data but keeps older generations until they get garbage collected.
+
+BTRFS authentication does not cover attacks where an attacker is able to
+execute code on the device after the authentication key was provided.
+Additional measures like secure boot and trusted boot have to be taken to
+ensure that only trusted code is executed on a device.
+
+As the file-system authentication key is also needed to update data structures
+on disk, the key has to be in the kernel's keyring for the whole time the
+file-system is mounted. An attacker that is able to compromise the kernel can
+be able to extract the key from the kernel's keyring and thus can gain the
+ability to modify the file-system later on.
+
+Authentication
+--------------
+
+To be able to fully trust data read from disk, all BTRFS data structures
+stored on disk are authenticated. That is:
+
+- The super blocks
+- The file-system b-trees
+- The user-data
+
+
+Super-block
+~~~~~~~~~~~
+
+In order to be able to authenticate the file-system's super-block, the
+checksum stored in the checksum field at the beginning of ``struct
+btrfs_super_block`` protecting its contents is replaced by a
+cryptographically secure keyed hash. In order to generate a valid super-block
+or to validate the super-block, one has to provide a key as an additional
+input for the hash function. The super-block is the starting point to read all
+on disk tree structures, so if we cannot trust the authenticity of the
+super-block anymore, we cannot trust the whole file-system.
+
+B-Trees
+~~~~~~~
+
+Starting from the super-block's root-tree root, the root tree holds the b-tree
+roots of all other on disk b-trees. All other file-system meta-data can be
+derived from the trees stored in this tree. As all b-trees in BTRFS are built
+using ``struct btrfs_node`` and ``struct btrfs_leaf`` each building block of
+each tree is checksummed. These checksums are replaced with the cryptographically
+secure keyed hash algorithm and the authentication key used to verify the
+super-block in the mount phase. Without this key it is impossible to alter any
+of the file-system structure without generating invalid hashes.
+
+User-data
+~~~~~~~~~
+
+The checksums for the user or file-data are stored in a separate b-tree, the
+checksum tree. As this tree in itself is authenticated, only the data stored
+in it needs to be authenticated. This is done by replacing the checksums
+stored on disk by the cryptographically secure keyed hash algorithm used for
+the super-block and other meta-data. So each written file block will get
+checksummed with the authentication key and without supplying the correct key
+it is impossible to write data on disk, which can be read back without
+failing the authentication test. If this test is failed, an I/O error is
+reported back to the user.
+
+Key Management
+--------------
+
+For simplicity, BTRFS authentication uses a single key to compute the keyed
+hashes of the super-block, b-tree nodes and leafs as well as file-blocks. This
+key has to be available on creation of the file-system (`mkfs.btrfs`) to
+authenticate all b-tree elements and the super-blocks. Further, it has to be
+available on mount of the file-system to verify the meta-data and user-data
+stored in the file-system.
+
+Limitations
+-----------
+
+As some optional features of BTRFS disable the generation of checksums, these
+features are incompatible with an authenticated BTRFS.
+These features are:
+- nodatacow
+- nodatasum
+
+As well as any offline modifications to the file-system, like setting an FS
+label while the FS is unmounted.