diff mbox series

[v2,1/2] btrfs: add authentication support

Message ID 20200428105859.4719-2-jth@kernel.org (mailing list archive)
State New, archived
Headers show
Series Add file-system authentication to BTRFS | expand

Commit Message

Johannes Thumshirn April 28, 2020, 10:58 a.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Add authentication support for a BTRFS file-system.

This works, because in BTRFS every meta-data block as well as every
data-block has a own checksum. For meta-data the checksum is in the
meta-data node itself. For data blocks, the checksums are stored in the
checksum tree.

When replacing the checksum algorithm with a keyed hash, like HMAC(SHA256),
a key is needed to mount a verified file-system. This key also needs to be
used at file-system creation time.

We have to used a keyed hash scheme, in contrast to doing a normal
cryptographic hash, to guarantee integrity of the file system, as a
potential attacker could just replay file-system operations and the
changes would go unnoticed.

Having a keyed hash only on the topmost Node of a tree or even just in the
super-block and using cryptographic hashes on the normal meta-data nodes
and checksum tree entries doesn't work either, as the BTRFS B-Tree's Nodes
do not include the checksums of their respective child nodes, but only the
block pointers and offsets where to find them on disk.

Also note, we do not need a incompat R/O flag for this, because if an old
kernel tries to mount an authenticated file-system it will fail the
initial checksum type verification and thus refuses to mount.

The key has to be supplied by the kernel's keyring and the method of
getting the key securely into the kernel is not subject of this patch.

Example usage:
Create a file-system with authentication key 0123456
mkfs.btrfs --csum hmac-sha256 --auth-key 0123456 /dev/disk

Add the key to the kernel's keyring as keyid 'btrfs:foo'
keyctl add logon btrfs:foo 0123456 @u

Mount the fs using the 'btrfs:foo' key
mount -t btrfs -o auth_key=btrfs:foo /dev/disk /mnt/point

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/ctree.c                |  3 ++-
 fs/btrfs/ctree.h                |  2 ++
 fs/btrfs/disk-io.c              | 53 ++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/super.c                | 24 ++++++++++++++++---
 include/uapi/linux/btrfs_tree.h |  1 +
 5 files changed, 78 insertions(+), 5 deletions(-)

Comments

kernel test robot April 29, 2020, 7:23 a.m. UTC | #1
Hi Johannes,

I love your patch! Yet something to improve:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on v5.7-rc3 next-20200428]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Johannes-Thumshirn/Add-file-system-authentication-to-BTRFS/20200429-030930
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: x86_64-randconfig-d002-20200428 (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project f30416fdde922eaa655934e050026930fefbd260)
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> fs/btrfs/disk-io.c:2227:8: error: implicit declaration of function 'request_key' [-Werror,-Wimplicit-function-declaration]
           key = request_key(&key_type_logon, fs_info->auth_key_name, NULL);
                 ^
>> fs/btrfs/disk-io.c:2227:21: error: use of undeclared identifier 'key_type_logon'
           key = request_key(&key_type_logon, fs_info->auth_key_name, NULL);
                              ^
>> fs/btrfs/disk-io.c:2231:16: error: incomplete definition of type 'struct key'
           down_read(&key->sem);
                      ~~~^
   include/linux/key.h:33:8: note: forward declaration of 'struct key'
   struct key;
          ^
>> fs/btrfs/disk-io.c:2233:8: error: implicit declaration of function 'user_key_payload_locked' [-Werror,-Wimplicit-function-declaration]
           ukp = user_key_payload_locked(key);
                 ^
>> fs/btrfs/disk-io.c:2233:6: warning: incompatible integer to pointer conversion assigning to 'const struct user_key_payload *' from 'int' [-Wint-conversion]
           ukp = user_key_payload_locked(key);
               ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/btrfs/disk-io.c:2240:52: error: incomplete definition of type 'struct user_key_payload'
           err = crypto_shash_setkey(fs_info->csum_shash, ukp->data, ukp->datalen);
                                                          ~~~^
   fs/btrfs/disk-io.c:2193:15: note: forward declaration of 'struct user_key_payload'
           const struct user_key_payload *ukp;
                        ^
   fs/btrfs/disk-io.c:2240:63: error: incomplete definition of type 'struct user_key_payload'
           err = crypto_shash_setkey(fs_info->csum_shash, ukp->data, ukp->datalen);
                                                                     ~~~^
   fs/btrfs/disk-io.c:2193:15: note: forward declaration of 'struct user_key_payload'
           const struct user_key_payload *ukp;
                        ^
   fs/btrfs/disk-io.c:2249:14: error: incomplete definition of type 'struct key'
           up_read(&key->sem);
                    ~~~^
   include/linux/key.h:33:8: note: forward declaration of 'struct key'
   struct key;
          ^
   1 warning and 7 errors generated.

vim +/request_key +2227 fs/btrfs/disk-io.c

  2187	
  2188	static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
  2189	{
  2190		struct crypto_shash *csum_shash;
  2191		const char *csum_driver = btrfs_super_csum_driver(csum_type);
  2192		struct key *key;
  2193		const struct user_key_payload *ukp;
  2194		int err = 0;
  2195	
  2196		csum_shash = crypto_alloc_shash(csum_driver, 0, 0);
  2197	
  2198		if (IS_ERR(csum_shash)) {
  2199			btrfs_err(fs_info, "error allocating %s hash for checksum",
  2200				  csum_driver);
  2201			return PTR_ERR(csum_shash);
  2202		}
  2203	
  2204		fs_info->csum_shash = csum_shash;
  2205	
  2206		/*
  2207		 * if we're not doing authentication, we're done by now. Still we have
  2208		 * to validate the possible combinations of BTRFS_MOUNT_AUTH_KEY and
  2209		 * keyed hashes.
  2210		 */
  2211		if (csum_type == BTRFS_CSUM_TYPE_HMAC_SHA256 &&
  2212		    !btrfs_test_opt(fs_info, AUTH_KEY)) {
  2213			crypto_free_shash(fs_info->csum_shash);
  2214			return -EINVAL;
  2215		} else if (btrfs_test_opt(fs_info, AUTH_KEY)
  2216			   && csum_type != BTRFS_CSUM_TYPE_HMAC_SHA256) {
  2217			crypto_free_shash(fs_info->csum_shash);
  2218			return -EINVAL;
  2219		} else if (!btrfs_test_opt(fs_info, AUTH_KEY)) {
  2220			/*
  2221			 * This is the normal case, if noone want's authentication and
  2222			 * doesn't have a keyed hash, we're done.
  2223			 */
  2224			return 0;
  2225		}
  2226	
> 2227		key = request_key(&key_type_logon, fs_info->auth_key_name, NULL);
  2228		if (IS_ERR(key))
  2229			return PTR_ERR(key);
  2230	
> 2231		down_read(&key->sem);
  2232	
> 2233		ukp = user_key_payload_locked(key);
  2234		if (!ukp) {
  2235			btrfs_err(fs_info, "");
  2236			err = -EKEYREVOKED;
  2237			goto out;
  2238		}
  2239	
> 2240		err = crypto_shash_setkey(fs_info->csum_shash, ukp->data, ukp->datalen);
  2241		if (err)
  2242			btrfs_err(fs_info, "error setting key %s for verification",
  2243				  fs_info->auth_key_name);
  2244	
  2245	out:
  2246		if (err)
  2247			crypto_free_shash(fs_info->csum_shash);
  2248	
  2249		up_read(&key->sem);
  2250		key_put(key);
  2251	
  2252		return err;
  2253	}
  2254	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Johannes Thumshirn April 29, 2020, 11:46 a.m. UTC | #2
I've forgot to amend some fixes before sending out, will repost soon.
Eric Biggers May 1, 2020, 5:39 a.m. UTC | #3
On Tue, Apr 28, 2020 at 12:58:58PM +0200, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> Add authentication support for a BTRFS file-system.
> 
> This works, because in BTRFS every meta-data block as well as every
> data-block has a own checksum. For meta-data the checksum is in the
> meta-data node itself. For data blocks, the checksums are stored in the
> checksum tree.
> 
> When replacing the checksum algorithm with a keyed hash, like HMAC(SHA256),
> a key is needed to mount a verified file-system. This key also needs to be
> used at file-system creation time.
> 
> We have to used a keyed hash scheme, in contrast to doing a normal
> cryptographic hash, to guarantee integrity of the file system, as a
> potential attacker could just replay file-system operations and the
> changes would go unnoticed.
> 
> Having a keyed hash only on the topmost Node of a tree or even just in the
> super-block and using cryptographic hashes on the normal meta-data nodes
> and checksum tree entries doesn't work either, as the BTRFS B-Tree's Nodes
> do not include the checksums of their respective child nodes, but only the
> block pointers and offsets where to find them on disk.
> 
> Also note, we do not need a incompat R/O flag for this, because if an old
> kernel tries to mount an authenticated file-system it will fail the
> initial checksum type verification and thus refuses to mount.
> 
> The key has to be supplied by the kernel's keyring and the method of
> getting the key securely into the kernel is not subject of this patch.

This is a good idea, but can you explain exactly what security properties you
aim to achieve?

As far as I can tell, btrfs hashes each data block individually.  There's no
contextual information about where eaech block is located or which file(s) it
belongs to.  So, with this proposal, an attacker can still replace any data
block with any other data block.  Is that what you have in mind?  Have you
considered including contextual information in the hashes, to prevent this?

What about metadata blocks -- how well are they authenticated?  Can they be
moved around?  And does this proposal authenticate *everything* on the
filesystem, or is there any part that is missed?  What about the superblock?

You also mentioned preventing replay of filesystem operations, which suggests
you're trying to achieve rollback protection.  But actually this scheme doesn't
provide rollback protection.  For one, an attacker can always just rollback the
entire filesystem to a previous state.

This feature would still be useful even with the above limitations.  But what is
your goal exactly?  Can this be made better?

> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d10c7be10f3b..fe403fb62178 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -17,6 +17,7 @@
>  #include <linux/error-injection.h>
>  #include <linux/crc32c.h>
>  #include <linux/sched/mm.h>
> +#include <keys/user-type.h>
>  #include <asm/unaligned.h>
>  #include <crypto/hash.h>
>  #include "ctree.h"
> @@ -339,6 +340,7 @@ static bool btrfs_supported_super_csum(u16 csum_type)
>  	case BTRFS_CSUM_TYPE_XXHASH:
>  	case BTRFS_CSUM_TYPE_SHA256:
>  	case BTRFS_CSUM_TYPE_BLAKE2:
> +	case BTRFS_CSUM_TYPE_HMAC_SHA256:
>  		return true;
>  	default:
>  		return false;
> @@ -2187,6 +2189,9 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
>  {
>  	struct crypto_shash *csum_shash;
>  	const char *csum_driver = btrfs_super_csum_driver(csum_type);
> +	struct key *key;
> +	const struct user_key_payload *ukp;
> +	int err = 0;
>  
>  	csum_shash = crypto_alloc_shash(csum_driver, 0, 0);
>  
> @@ -2198,7 +2203,53 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
>  
>  	fs_info->csum_shash = csum_shash;
>  
> -	return 0;
> +	/*
> +	 * if we're not doing authentication, we're done by now. Still we have
> +	 * to validate the possible combinations of BTRFS_MOUNT_AUTH_KEY and
> +	 * keyed hashes.
> +	 */
> +	if (csum_type == BTRFS_CSUM_TYPE_HMAC_SHA256 &&
> +	    !btrfs_test_opt(fs_info, AUTH_KEY)) {
> +		crypto_free_shash(fs_info->csum_shash);
> +		return -EINVAL;

Seems there should be an error message here that says that a key is needed.

> +	} else if (btrfs_test_opt(fs_info, AUTH_KEY)
> +		   && csum_type != BTRFS_CSUM_TYPE_HMAC_SHA256) {
> +		crypto_free_shash(fs_info->csum_shash);
> +		return -EINVAL;

The hash algorithm needs to be passed as a mount option.  Otherwise the attacker
gets to choose it for you among all the supported keyed hash algorithms, as soon
as support for a second one is added.  Maybe use 'auth_hash_name' like UBIFS
does?

> +	} else if (!btrfs_test_opt(fs_info, AUTH_KEY)) {
> +		/*
> +		 * This is the normal case, if noone want's authentication and
> +		 * doesn't have a keyed hash, we're done.
> +		 */
> +		return 0;
> +	}
> +
> +	key = request_key(&key_type_logon, fs_info->auth_key_name, NULL);
> +	if (IS_ERR(key))
> +		return PTR_ERR(key);

Seems this should print an error message if the key can't be found.

Also, this should enforce that the key description uses a service prefix by
formatting it as kasprintf("btrfs:%s", fs_info->auth_key_name).  Otherwise
people can abuse this feature to use keys that were added for other kernel
features.  (This already got screwed up elsewhere, which makes the "logon" key
type a bit of a disaster.  But there's no need to make it worse.)

- Eric
Eric Biggers May 1, 2020, 6:30 a.m. UTC | #4
On Thu, Apr 30, 2020 at 10:39:08PM -0700, Eric Biggers wrote:
> On Tue, Apr 28, 2020 at 12:58:58PM +0200, Johannes Thumshirn wrote:
> > From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> > 
> > Add authentication support for a BTRFS file-system.
> > 
> > This works, because in BTRFS every meta-data block as well as every
> > data-block has a own checksum. For meta-data the checksum is in the
> > meta-data node itself. For data blocks, the checksums are stored in the
> > checksum tree.
> > 
> > When replacing the checksum algorithm with a keyed hash, like HMAC(SHA256),
> > a key is needed to mount a verified file-system. This key also needs to be
> > used at file-system creation time.
> > 
> > We have to used a keyed hash scheme, in contrast to doing a normal
> > cryptographic hash, to guarantee integrity of the file system, as a
> > potential attacker could just replay file-system operations and the
> > changes would go unnoticed.
> > 
> > Having a keyed hash only on the topmost Node of a tree or even just in the
> > super-block and using cryptographic hashes on the normal meta-data nodes
> > and checksum tree entries doesn't work either, as the BTRFS B-Tree's Nodes
> > do not include the checksums of their respective child nodes, but only the
> > block pointers and offsets where to find them on disk.
> > 
> > Also note, we do not need a incompat R/O flag for this, because if an old
> > kernel tries to mount an authenticated file-system it will fail the
> > initial checksum type verification and thus refuses to mount.
> > 
> > The key has to be supplied by the kernel's keyring and the method of
> > getting the key securely into the kernel is not subject of this patch.
> 
> This is a good idea, but can you explain exactly what security properties you
> aim to achieve?
> 
> As far as I can tell, btrfs hashes each data block individually.  There's no
> contextual information about where eaech block is located or which file(s) it
> belongs to.  So, with this proposal, an attacker can still replace any data
> block with any other data block.  Is that what you have in mind?  Have you
> considered including contextual information in the hashes, to prevent this?
> 
> What about metadata blocks -- how well are they authenticated?  Can they be
> moved around?  And does this proposal authenticate *everything* on the
> filesystem, or is there any part that is missed?  What about the superblock?
> 
> You also mentioned preventing replay of filesystem operations, which suggests
> you're trying to achieve rollback protection.  But actually this scheme doesn't
> provide rollback protection.  For one, an attacker can always just rollback the
> entire filesystem to a previous state.
> 
> This feature would still be useful even with the above limitations.  But what is
> your goal exactly?  Can this be made better?

btrfs also has an inode flag BTRFS_INODE_NODATASUM, which looks scary as it
results in the file being unauthenticated.  Presumably the authentication of the
filesystem metadata is supposed to prevent this flag from being maliciously
cleared?  It might be a good idea to forbid this flag if the filesystem is using
the authentication feature.

- Eric
Johannes Thumshirn May 4, 2020, 8:38 a.m. UTC | #5
On 01/05/2020 08:30, Eric Biggers wrote:
> btrfs also has an inode flag BTRFS_INODE_NODATASUM, which looks scary as it
> results in the file being unauthenticated.  Presumably the authentication of the
> filesystem metadata is supposed to prevent this flag from being maliciously
> cleared?  It might be a good idea to forbid this flag if the filesystem is using
> the authentication feature.

Yes indeed, authentication and nodatasum must be mutually exclusive.

Thanks for spotting.
Johannes Thumshirn May 4, 2020, 10:09 a.m. UTC | #6
On 01/05/2020 07:39, Eric Biggers wrote:
[...]

Thanks for taking the time to look through this.

> 
> This is a good idea, but can you explain exactly what security properties you
> aim to achieve?

My goal is to protect the file-system against offline modifications. 
Offline in this context means when the filesystem is not mounted.

This could be a switched off laptop in a hotel room or a container 
image, or a powered off embedded system. When the file-system is mounted 
normal read/write access is possible.

> As far as I can tell, btrfs hashes each data block individually.  There's no
> contextual information about where eaech block is located or which file(s) it
> belongs to.  So, with this proposal, an attacker can still replace any data
> block with any other data block.  Is that what you have in mind?  Have you
> considered including contextual information in the hashes, to prevent this?
> 
> What about metadata blocks -- how well are they authenticated?  Can they be
> moved around?  And does this proposal authenticate *everything* on the
> filesystem, or is there any part that is missed?  What about the superblock?

In btrfs every metadata block is started by a checksum (see struct 
btrfs_header or struct btrfs_super_block). This checksum protects the 
whole meta-data block (minus the checksum field, obviously).

The two main structure of the trees are btrfs_node and btrfs_leaf (both 
started by a btrfs_header). struct btrfs_node holds the generation and 
the block pointers of child nodes (and leafs). Struct btrfs_leaf holds 
the items, which can be inodes, directories, extents, checksums, 
block-groups, etc...

As each FS meta-data item, beginning with the super block, downwards to 
the meta-data items themselves is protected be a checksum, so the FS 
tree (including locations, generation, etc) is protected by a checksum, 
for which the attacker would need to know the key to generate.

The checksum for data blocks is saved in a separate on-disk btree, the 
checksum tree. The structure of the checksum tree consists of 
btrfs_leafs and btrfs_nodes as well, both of which do have a 
btrfs_header and thus are protected by the checksums.

> 
> You also mentioned preventing replay of filesystem operations, which suggests
> you're trying to achieve rollback protection.  But actually this scheme doesn't
> provide rollback protection.  For one, an attacker can always just rollback the
> entire filesystem to a previous state.

You're right, replay is the wrong wording there and it's actually 
harmful in the used context. What I had in mind was, in order to change 
the structure of the filesystem, an attacker would need the key to 
update the checksums, otherwise the next read will detect a corruption.

As for a real replay case, an attacker would need to increase the 
generation of the tree block, in order to roll back to a older state, an 
attacker still would need to modify the super-block's generation, which 
is protected by the checksum as well.

> This feature would still be useful even with the above limitations.  But what is
> your goal exactly?  Can this be made better?
> 
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index d10c7be10f3b..fe403fb62178 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -17,6 +17,7 @@
>>   #include <linux/error-injection.h>
>>   #include <linux/crc32c.h>
>>   #include <linux/sched/mm.h>
>> +#include <keys/user-type.h>
>>   #include <asm/unaligned.h>
>>   #include <crypto/hash.h>
>>   #include "ctree.h"
>> @@ -339,6 +340,7 @@ static bool btrfs_supported_super_csum(u16 csum_type)
>>   	case BTRFS_CSUM_TYPE_XXHASH:
>>   	case BTRFS_CSUM_TYPE_SHA256:
>>   	case BTRFS_CSUM_TYPE_BLAKE2:
>> +	case BTRFS_CSUM_TYPE_HMAC_SHA256:
>>   		return true;
>>   	default:
>>   		return false;
>> @@ -2187,6 +2189,9 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
>>   {
>>   	struct crypto_shash *csum_shash;
>>   	const char *csum_driver = btrfs_super_csum_driver(csum_type);
>> +	struct key *key;
>> +	const struct user_key_payload *ukp;
>> +	int err = 0;
>>   
>>   	csum_shash = crypto_alloc_shash(csum_driver, 0, 0);
>>   
>> @@ -2198,7 +2203,53 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
>>   
>>   	fs_info->csum_shash = csum_shash;
>>   
>> -	return 0;
>> +	/*
>> +	 * if we're not doing authentication, we're done by now. Still we have
>> +	 * to validate the possible combinations of BTRFS_MOUNT_AUTH_KEY and
>> +	 * keyed hashes.
>> +	 */
>> +	if (csum_type == BTRFS_CSUM_TYPE_HMAC_SHA256 &&
>> +	    !btrfs_test_opt(fs_info, AUTH_KEY)) {
>> +		crypto_free_shash(fs_info->csum_shash);
>> +		return -EINVAL;
> 
> Seems there should be an error message here that says that a key is needed.
> 
>> +	} else if (btrfs_test_opt(fs_info, AUTH_KEY)
>> +		   && csum_type != BTRFS_CSUM_TYPE_HMAC_SHA256) {
>> +		crypto_free_shash(fs_info->csum_shash);
>> +		return -EINVAL;
> 
> The hash algorithm needs to be passed as a mount option.  Otherwise the attacker
> gets to choose it for you among all the supported keyed hash algorithms, as soon
> as support for a second one is added.  Maybe use 'auth_hash_name' like UBIFS
> does?

Can you elaborate a bit more on that? As far as I know, UBIFS doesn't 
save the 'auth_hash_name' on disk, whereas 'BTRFS_CSUM_TYPE_HMAC_SHA256' 
is part of the on-disk format. As soon as we add a 2nd keyed hash, say 
BTRFS_CSUM_TYPE_BLAKE2B_KEYED, this will be in the superblock as well, 
as struct btrfs_super_block::csum_type.

> 
>> +	} else if (!btrfs_test_opt(fs_info, AUTH_KEY)) {
>> +		/*
>> +		 * This is the normal case, if noone want's authentication and
>> +		 * doesn't have a keyed hash, we're done.
>> +		 */
>> +		return 0;
>> +	}
>> +
>> +	key = request_key(&key_type_logon, fs_info->auth_key_name, NULL);
>> +	if (IS_ERR(key))
>> +		return PTR_ERR(key);
> 
> Seems this should print an error message if the key can't be found.
> 
> Also, this should enforce that the key description uses a service prefix by
> formatting it as kasprintf("btrfs:%s", fs_info->auth_key_name).  Otherwise
> people can abuse this feature to use keys that were added for other kernel
> features.  (This already got screwed up elsewhere, which makes the "logon" key
> type a bit of a disaster.  But there's no need to make it worse.)

OK, will do.
Eric Biggers May 4, 2020, 8:59 p.m. UTC | #7
On Mon, May 04, 2020 at 10:09:44AM +0000, Johannes Thumshirn wrote:
> On 01/05/2020 07:39, Eric Biggers wrote:
> [...]
> 
> Thanks for taking the time to look through this.
> 
> > 
> > This is a good idea, but can you explain exactly what security properties you
> > aim to achieve?
> 
> My goal is to protect the file-system against offline modifications. 
> Offline in this context means when the filesystem is not mounted.
> 
> This could be a switched off laptop in a hotel room or a container 
> image, or a powered off embedded system. When the file-system is mounted 
> normal read/write access is possible.

But your proposed design doesn't do this completely, since some times of offline
modifications are still possible.

So that's why I'm asking *exactly* what security properties it will provide.

> 
> > As far as I can tell, btrfs hashes each data block individually.  There's no
> > contextual information about where eaech block is located or which file(s) it
> > belongs to.  So, with this proposal, an attacker can still replace any data
> > block with any other data block.  Is that what you have in mind?  Have you
> > considered including contextual information in the hashes, to prevent this?
> > 
> > What about metadata blocks -- how well are they authenticated?  Can they be
> > moved around?  And does this proposal authenticate *everything* on the
> > filesystem, or is there any part that is missed?  What about the superblock?
> 
> In btrfs every metadata block is started by a checksum (see struct 
> btrfs_header or struct btrfs_super_block). This checksum protects the 
> whole meta-data block (minus the checksum field, obviously).
> 
> The two main structure of the trees are btrfs_node and btrfs_leaf (both 
> started by a btrfs_header). struct btrfs_node holds the generation and 
> the block pointers of child nodes (and leafs). Struct btrfs_leaf holds 
> the items, which can be inodes, directories, extents, checksums, 
> block-groups, etc...
> 
> As each FS meta-data item, beginning with the super block, downwards to 
> the meta-data items themselves is protected be a checksum, so the FS 
> tree (including locations, generation, etc) is protected by a checksum, 
> for which the attacker would need to know the key to generate.
> 
> The checksum for data blocks is saved in a separate on-disk btree, the 
> checksum tree. The structure of the checksum tree consists of 
> btrfs_leafs and btrfs_nodes as well, both of which do have a 
> btrfs_header and thus are protected by the checksums.

Does this mean that a parent node's checksum doesn't cover the checksum of its
child nodes, but rather only their locations?  Doesn't that allow subtrees to be
swapped around without being detected?

> 
> > 
> > You also mentioned preventing replay of filesystem operations, which suggests
> > you're trying to achieve rollback protection.  But actually this scheme doesn't
> > provide rollback protection.  For one, an attacker can always just rollback the
> > entire filesystem to a previous state.
> 
> You're right, replay is the wrong wording there and it's actually 
> harmful in the used context. What I had in mind was, in order to change 
> the structure of the filesystem, an attacker would need the key to 
> update the checksums, otherwise the next read will detect a corruption.
> 
> As for a real replay case, an attacker would need to increase the 
> generation of the tree block, in order to roll back to a older state, an 
> attacker still would need to modify the super-block's generation, which 
> is protected by the checksum as well.

Actually, nothing in the current design prevents the whole filesystem from being
rolled back to an earlier state.  So, an attacker can actually both "change the
structure of the filesystem" and "roll back to an earlier state".

It very well might not be practical to provide rollback protection, and this
feature would still be useful without it.  But I'm concerned that you're
claiming that this feature provides rollback protection when it doesn't.

> > The hash algorithm needs to be passed as a mount option.  Otherwise the attacker
> > gets to choose it for you among all the supported keyed hash algorithms, as soon
> > as support for a second one is added.  Maybe use 'auth_hash_name' like UBIFS
> > does?
> 
> Can you elaborate a bit more on that? As far as I know, UBIFS doesn't 
> save the 'auth_hash_name' on disk, whereas 'BTRFS_CSUM_TYPE_HMAC_SHA256' 
> is part of the on-disk format. As soon as we add a 2nd keyed hash, say 
> BTRFS_CSUM_TYPE_BLAKE2B_KEYED, this will be in the superblock as well, 
> as struct btrfs_super_block::csum_type.
> 

The data on disk isn't trusted.  Isn't that the whole point?  The filesystem
doesn't trust it until it is MAC'ed, and to do that it needs the MAC algorithm.

- Eric
Richard Weinberger May 4, 2020, 9:37 p.m. UTC | #8
----- Ursprüngliche Mail -----
>> The hash algorithm needs to be passed as a mount option.  Otherwise the attacker
>> gets to choose it for you among all the supported keyed hash algorithms, as soon
>> as support for a second one is added.  Maybe use 'auth_hash_name' like UBIFS
>> does?
> 
> Can you elaborate a bit more on that? As far as I know, UBIFS doesn't
> save the 'auth_hash_name' on disk, whereas 'BTRFS_CSUM_TYPE_HMAC_SHA256'
> is part of the on-disk format. As soon as we add a 2nd keyed hash, say
> BTRFS_CSUM_TYPE_BLAKE2B_KEYED, this will be in the superblock as well,
> as struct btrfs_super_block::csum_type.

Well, UBIFS stores auth_hash_name on disk but does not trust it.
It is always required to provide auth_hash_name as mount parameter.
At mount time it is compared to the stored name (among with other parameters)
to detect misconfigurations.

Thanks,
//richard
Richard Weinberger May 4, 2020, 9:59 p.m. UTC | #9
----- Ursprüngliche Mail -----
> Von: "Johannes Thumshirn" <jth@kernel.org>
> An: "David Sterba" <dsterba@suse.cz>
> CC: "linux-fsdevel" <linux-fsdevel@vger.kernel.org>, "linux-btrfs" <linux-btrfs@vger.kernel.org>, "Eric Biggers"
> <ebiggers@google.com>, "richard" <richard@nod.at>, "Johannes Thumshirn" <johannes.thumshirn@wdc.com>, "Johannes
> Thumshirn" <jthumshirn@suse.de>
> Gesendet: Dienstag, 28. April 2020 12:58:58
> Betreff: [PATCH v2 1/2] btrfs: add authentication support

> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> Add authentication support for a BTRFS file-system.
> 
> This works, because in BTRFS every meta-data block as well as every
> data-block has a own checksum. For meta-data the checksum is in the
> meta-data node itself. For data blocks, the checksums are stored in the
> checksum tree.

Eric already raised doubts, let me ask more directly.
Does the checksum tree really cover all moving parts of BTRFS?

I'm a little surprised how small your patch is.
Getting all this done for UBIFS was not easy and given that UBIFS is truly
copy-on-write it was still less work than it would be for other filesystems.

If I understand the checksum tree correctly, the main purpose is protecting
you from flipping bits.
An attacker will perform much more sophisticated attacks.

Thanks,
//richard
Johannes Thumshirn May 5, 2020, 7:46 a.m. UTC | #10
On 04/05/2020 23:41, Richard Weinberger wrote:
> Well, UBIFS stores auth_hash_name on disk but does not trust it.
> It is always required to provide auth_hash_name as mount parameter.
> At mount time it is compared to the stored name (among with other parameters)
> to detect misconfigurations.

OK, thanks for the information.

Will do so as well in v3
Johannes Thumshirn May 5, 2020, 7:55 a.m. UTC | #11
On 04/05/2020 23:59, Richard Weinberger wrote:
> Eric already raised doubts, let me ask more directly.
> Does the checksum tree really cover all moving parts of BTRFS?
> 
> I'm a little surprised how small your patch is.
> Getting all this done for UBIFS was not easy and given that UBIFS is truly
> copy-on-write it was still less work than it would be for other filesystems.
> 
> If I understand the checksum tree correctly, the main purpose is protecting
> you from flipping bits.
> An attacker will perform much more sophisticated attacks.

[ Adding Jeff with whom I did the design work ]

The checksum tree only covers the file-system payload. But combined with 
the checksum field, which is the start of every on-disk structure, we 
have all parts of the filesystem checksummed.
Johannes Thumshirn May 5, 2020, 8:11 a.m. UTC | #12
On 04/05/2020 22:59, Eric Biggers wrote:
[...]

> But your proposed design doesn't do this completely, since some times of offline
> modifications are still possible.
> 
> So that's why I'm asking *exactly* what security properties it will provide.

[...]

> Does this mean that a parent node's checksum doesn't cover the checksum of its
> child nodes, but rather only their locations?  Doesn't that allow subtrees to be
> swapped around without being detected?

I was about to say "no you can't swap the subtrees as the header also 
stores the address of the block", but please give me some more time to 
think about it. I don't want to give a wrong answer.

[...]

> Actually, nothing in the current design prevents the whole filesystem from being
> rolled back to an earlier state.  So, an attacker can actually both "change the
> structure of the filesystem" and "roll back to an earlier state".

Can you give an example how an attacker could do a rollback of the whole 
filesystem without the key? What am I missing?

> It very well might not be practical to provide rollback protection, and this
> feature would still be useful without it.  But I'm concerned that you're
> claiming that this feature provides rollback protection when it doesn't.

OK.

[...]

> The data on disk isn't trusted.  Isn't that the whole point?  The filesystem
> doesn't trust it until it is MAC'ed, and to do that it needs the MAC algorithm.

OK, will add this in the next round.

Thanks,
	Johannes
Qu Wenruo May 5, 2020, 9:26 a.m. UTC | #13
On 2020/5/5 下午4:11, Johannes Thumshirn wrote:
> On 04/05/2020 22:59, Eric Biggers wrote:
> [...]
> 
>> But your proposed design doesn't do this completely, since some times of offline
>> modifications are still possible.
>>
>> So that's why I'm asking *exactly* what security properties it will provide.
> 
> [...]
> 
>> Does this mean that a parent node's checksum doesn't cover the checksum of its
>> child nodes, but rather only their locations?  Doesn't that allow subtrees to be
>> swapped around without being detected?
> 
> I was about to say "no you can't swap the subtrees as the header also 
> stores the address of the block", but please give me some more time to 
> think about it. I don't want to give a wrong answer.

My personal idea on this swap-tree attack is, the first key, generation,
bytenr protection can prevent such case.

The protection chain begins from superblock, and ends at the leaf tree
blocks, as long as superblock is also protected by hmac hash, it should
be safe.


Btrfs protects parent-child relationship by:
- Parent has the pointer (bytenr) of its child
  The main protection. If attacker wants to swap one tree block, it must
  change the parent tree block.
  The parent is either a tree block (parent node), or root item in root
  tree, or a super block.
  All protected by hmac csum. Thus attack can only do such attach by
  knowing the key.

- Parent has the first key of its child
  Unlike previous one, this is just an extra check, no extra protection.
  And root item doesn't contain the first key.

- Parent has the generation of its child tree block
  This means, if the attacker wants to use old tree blocks (since btrfs
  also do COW, at keeps tree blocks of previous transaction), it much
  also revert its parent tree block/root item/superblock.
  The chain ends at superblock, but superblock is never COWed, means
  attacker can't easily re-create an old superblock to do such rollback
  attack.

  Also btrfs has backup super blocks, backup still differs from the
  primary by its bytenr. Thus attacker still needs the key to regenerate
  a valid primary super block to rollback.

  But this still exposed a hole for attacker to utilize, the
  usebackuproot mount option, to do such rollback attack.

  So we need to do something about it.
> 
> [...]
> 
>> Actually, nothing in the current design prevents the whole filesystem from being
>> rolled back to an earlier state.  So, an attacker can actually both "change the
>> structure of the filesystem" and "roll back to an earlier state".
> 
> Can you give an example how an attacker could do a rollback of the whole 
> filesystem without the key? What am I missing?

As explained, attacker needs the key to regenerate the primary
superblock, or use the usebackuproot mount option to abuse the recovery
oriented mount option.

The only attack I can thing of is re-generating the csum using
non-authentic algorithm, mostly in user space.
But it can be easily detected.

Thanks,
Qu

> 
>> It very well might not be practical to provide rollback protection, and this
>> feature would still be useful without it.  But I'm concerned that you're
>> claiming that this feature provides rollback protection when it doesn't.
> 
> OK.
> 
> [...]
> 
>> The data on disk isn't trusted.  Isn't that the whole point?  The filesystem
>> doesn't trust it until it is MAC'ed, and to do that it needs the MAC algorithm.
> 
> OK, will add this in the next round.
> 
> Thanks,
> 	Johannes
>
Qu Wenruo May 5, 2020, 9:43 a.m. UTC | #14
On 2020/4/28 下午6:58, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> Add authentication support for a BTRFS file-system.
> 
> This works, because in BTRFS every meta-data block as well as every
> data-block has a own checksum. For meta-data the checksum is in the
> meta-data node itself. For data blocks, the checksums are stored in the
> checksum tree.
> 
> When replacing the checksum algorithm with a keyed hash, like HMAC(SHA256),
> a key is needed to mount a verified file-system. This key also needs to be
> used at file-system creation time.
> 
> We have to used a keyed hash scheme, in contrast to doing a normal
> cryptographic hash, to guarantee integrity of the file system, as a
> potential attacker could just replay file-system operations and the
> changes would go unnoticed.
> 
> Having a keyed hash only on the topmost Node of a tree or even just in the
> super-block and using cryptographic hashes on the normal meta-data nodes
> and checksum tree entries doesn't work either, as the BTRFS B-Tree's Nodes
> do not include the checksums of their respective child nodes, but only the
> block pointers and offsets where to find them on disk.
> 
> Also note, we do not need a incompat R/O flag for this, because if an old
> kernel tries to mount an authenticated file-system it will fail the
> initial checksum type verification and thus refuses to mount.
> 
> The key has to be supplied by the kernel's keyring and the method of
> getting the key securely into the kernel is not subject of this patch.
> 
> Example usage:
> Create a file-system with authentication key 0123456
> mkfs.btrfs --csum hmac-sha256 --auth-key 0123456 /dev/disk
> 
> Add the key to the kernel's keyring as keyid 'btrfs:foo'
> keyctl add logon btrfs:foo 0123456 @u
> 
> Mount the fs using the 'btrfs:foo' key
> mount -t btrfs -o auth_key=btrfs:foo /dev/disk /mnt/point
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>

Looks pretty straight forward, and has the basic protection against
re-writing all csum attack.

So looks good to me.

But still I have one question around the device scan part.

Since now superblock can only be read after verified the csum, it means
without auth_key mount option, device scan won't even work properly.

Do you assume that all such hmac protected multi-device btrfs must be
mounted using device= mount option along with auth_key?
If so, a lot of users won't be that happy afaik.

Thanks,
Qu

> ---
>  fs/btrfs/ctree.c                |  3 ++-
>  fs/btrfs/ctree.h                |  2 ++
>  fs/btrfs/disk-io.c              | 53 ++++++++++++++++++++++++++++++++++++++++-
>  fs/btrfs/super.c                | 24 ++++++++++++++++---
>  include/uapi/linux/btrfs_tree.h |  1 +
>  5 files changed, 78 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 6c28efe5b14a..76418b5b00a6 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -31,7 +31,7 @@ static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
>  
>  static const struct btrfs_csums {
>  	u16		size;
> -	const char	name[10];
> +	const char	name[12];
>  	const char	driver[12];
>  } btrfs_csums[] = {
>  	[BTRFS_CSUM_TYPE_CRC32] = { .size = 4, .name = "crc32c" },
> @@ -39,6 +39,7 @@ static const struct btrfs_csums {
>  	[BTRFS_CSUM_TYPE_SHA256] = { .size = 32, .name = "sha256" },
>  	[BTRFS_CSUM_TYPE_BLAKE2] = { .size = 32, .name = "blake2b",
>  				     .driver = "blake2b-256" },
> +	[BTRFS_CSUM_TYPE_HMAC_SHA256] = { .size = 32, .name = "hmac(sha256)" }
>  };
>  
>  int btrfs_super_csum_size(const struct btrfs_super_block *s)
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index c79e0b0eac54..b692b3dc4593 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -719,6 +719,7 @@ struct btrfs_fs_info {
>  	struct rb_root swapfile_pins;
>  
>  	struct crypto_shash *csum_shash;
> +	char *auth_key_name;
>  
>  	/*
>  	 * Number of send operations in progress.
> @@ -1027,6 +1028,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
>  #define BTRFS_MOUNT_NOLOGREPLAY		(1 << 27)
>  #define BTRFS_MOUNT_REF_VERIFY		(1 << 28)
>  #define BTRFS_MOUNT_DISCARD_ASYNC	(1 << 29)
> +#define BTRFS_MOUNT_AUTH_KEY		(1 << 30)
>  
>  #define BTRFS_DEFAULT_COMMIT_INTERVAL	(30)
>  #define BTRFS_DEFAULT_MAX_INLINE	(2048)
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d10c7be10f3b..fe403fb62178 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -17,6 +17,7 @@
>  #include <linux/error-injection.h>
>  #include <linux/crc32c.h>
>  #include <linux/sched/mm.h>
> +#include <keys/user-type.h>
>  #include <asm/unaligned.h>
>  #include <crypto/hash.h>
>  #include "ctree.h"
> @@ -339,6 +340,7 @@ static bool btrfs_supported_super_csum(u16 csum_type)
>  	case BTRFS_CSUM_TYPE_XXHASH:
>  	case BTRFS_CSUM_TYPE_SHA256:
>  	case BTRFS_CSUM_TYPE_BLAKE2:
> +	case BTRFS_CSUM_TYPE_HMAC_SHA256:
>  		return true;
>  	default:
>  		return false;
> @@ -2187,6 +2189,9 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
>  {
>  	struct crypto_shash *csum_shash;
>  	const char *csum_driver = btrfs_super_csum_driver(csum_type);
> +	struct key *key;
> +	const struct user_key_payload *ukp;
> +	int err = 0;
>  
>  	csum_shash = crypto_alloc_shash(csum_driver, 0, 0);
>  
> @@ -2198,7 +2203,53 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
>  
>  	fs_info->csum_shash = csum_shash;
>  
> -	return 0;
> +	/*
> +	 * if we're not doing authentication, we're done by now. Still we have
> +	 * to validate the possible combinations of BTRFS_MOUNT_AUTH_KEY and
> +	 * keyed hashes.
> +	 */
> +	if (csum_type == BTRFS_CSUM_TYPE_HMAC_SHA256 &&
> +	    !btrfs_test_opt(fs_info, AUTH_KEY)) {
> +		crypto_free_shash(fs_info->csum_shash);
> +		return -EINVAL;
> +	} else if (btrfs_test_opt(fs_info, AUTH_KEY)
> +		   && csum_type != BTRFS_CSUM_TYPE_HMAC_SHA256) {
> +		crypto_free_shash(fs_info->csum_shash);
> +		return -EINVAL;
> +	} else if (!btrfs_test_opt(fs_info, AUTH_KEY)) {
> +		/*
> +		 * This is the normal case, if noone want's authentication and
> +		 * doesn't have a keyed hash, we're done.
> +		 */
> +		return 0;
> +	}
> +
> +	key = request_key(&key_type_logon, fs_info->auth_key_name, NULL);
> +	if (IS_ERR(key))
> +		return PTR_ERR(key);
> +
> +	down_read(&key->sem);
> +
> +	ukp = user_key_payload_locked(key);
> +	if (!ukp) {
> +		btrfs_err(fs_info, "");
> +		err = -EKEYREVOKED;
> +		goto out;
> +	}
> +
> +	err = crypto_shash_setkey(fs_info->csum_shash, ukp->data, ukp->datalen);
> +	if (err)
> +		btrfs_err(fs_info, "error setting key %s for verification",
> +			  fs_info->auth_key_name);
> +
> +out:
> +	if (err)
> +		crypto_free_shash(fs_info->csum_shash);
> +
> +	up_read(&key->sem);
> +	key_put(key);
> +
> +	return err;
>  }
>  
>  static int btrfs_replay_log(struct btrfs_fs_info *fs_info,
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 7932d8d07cff..2645a9cee8d1 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -333,6 +333,7 @@ enum {
>  	Opt_treelog, Opt_notreelog,
>  	Opt_usebackuproot,
>  	Opt_user_subvol_rm_allowed,
> +	Opt_auth_key,
>  
>  	/* Deprecated options */
>  	Opt_alloc_start,
> @@ -401,6 +402,7 @@ static const match_table_t tokens = {
>  	{Opt_notreelog, "notreelog"},
>  	{Opt_usebackuproot, "usebackuproot"},
>  	{Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"},
> +	{Opt_auth_key, "auth_key=%s"},
>  
>  	/* Deprecated options */
>  	{Opt_alloc_start, "alloc_start=%s"},
> @@ -910,7 +912,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
>   * All other options will be parsed on much later in the mount process and
>   * only when we need to allocate a new super block.
>   */
> -static int btrfs_parse_device_options(const char *options, fmode_t flags,
> +static int btrfs_parse_device_options(struct btrfs_fs_info *info,
> +				      const char *options, fmode_t flags,
>  				      void *holder)
>  {
>  	substring_t args[MAX_OPT_ARGS];
> @@ -939,7 +942,8 @@ static int btrfs_parse_device_options(const char *options, fmode_t flags,
>  			continue;
>  
>  		token = match_token(p, tokens, args);
> -		if (token == Opt_device) {
> +		switch (token) {
> +		case Opt_device:
>  			device_name = match_strdup(&args[0]);
>  			if (!device_name) {
>  				error = -ENOMEM;
> @@ -952,6 +956,18 @@ static int btrfs_parse_device_options(const char *options, fmode_t flags,
>  				error = PTR_ERR(device);
>  				goto out;
>  			}
> +			break;
> +		case Opt_auth_key:
> +			info->auth_key_name = match_strdup(&args[0]);
> +			if (!info->auth_key_name) {
> +				error = -ENOMEM;
> +				goto out;
> +			}
> +			btrfs_info(info, "doing authentication");
> +			btrfs_set_opt(info->mount_opt, AUTH_KEY);
> +			break;
> +		default:
> +			break;
>  		}
>  	}
>  
> @@ -1394,6 +1410,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
>  #endif
>  	if (btrfs_test_opt(info, REF_VERIFY))
>  		seq_puts(seq, ",ref_verify");
> +	if (btrfs_test_opt(info, AUTH_KEY))
> +		seq_printf(seq, ",auth_key=%s", info->auth_key_name);
>  	seq_printf(seq, ",subvolid=%llu",
>  		  BTRFS_I(d_inode(dentry))->root->root_key.objectid);
>  	seq_puts(seq, ",subvol=");
> @@ -1542,7 +1560,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
>  	}
>  
>  	mutex_lock(&uuid_mutex);
> -	error = btrfs_parse_device_options(data, mode, fs_type);
> +	error = btrfs_parse_device_options(fs_info, data, mode, fs_type);
>  	if (error) {
>  		mutex_unlock(&uuid_mutex);
>  		goto error_fs_info;
> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
> index a02318e4d2a9..bfaf127b37fd 100644
> --- a/include/uapi/linux/btrfs_tree.h
> +++ b/include/uapi/linux/btrfs_tree.h
> @@ -344,6 +344,7 @@ enum btrfs_csum_type {
>  	BTRFS_CSUM_TYPE_XXHASH	= 1,
>  	BTRFS_CSUM_TYPE_SHA256	= 2,
>  	BTRFS_CSUM_TYPE_BLAKE2	= 3,
> +	BTRFS_CSUM_TYPE_HMAC_SHA256 = 32,
>  };
>  
>  /*
>
Qu Wenruo May 5, 2020, 9:59 a.m. UTC | #15
On 2020/5/5 下午5:26, Qu Wenruo wrote:
> 
> 
> On 2020/5/5 下午4:11, Johannes Thumshirn wrote:
>> On 04/05/2020 22:59, Eric Biggers wrote:
>> [...]
>>
>>> But your proposed design doesn't do this completely, since some times of offline
>>> modifications are still possible.
>>>
>>> So that's why I'm asking *exactly* what security properties it will provide.
>>
>> [...]
>>
>>> Does this mean that a parent node's checksum doesn't cover the checksum of its
>>> child nodes, but rather only their locations?  Doesn't that allow subtrees to be
>>> swapped around without being detected?
>>
>> I was about to say "no you can't swap the subtrees as the header also 
>> stores the address of the block", but please give me some more time to 
>> think about it. I don't want to give a wrong answer.
> 
> My personal idea on this swap-tree attack is, the first key, generation,
> bytenr protection can prevent such case.
> 
> The protection chain begins from superblock, and ends at the leaf tree
> blocks, as long as superblock is also protected by hmac hash, it should
> be safe.
> 
> 
> Btrfs protects parent-child relationship by:
> - Parent has the pointer (bytenr) of its child
>   The main protection. If attacker wants to swap one tree block, it must
>   change the parent tree block.
>   The parent is either a tree block (parent node), or root item in root
>   tree, or a super block.
>   All protected by hmac csum. Thus attack can only do such attach by
>   knowing the key.
> 
> - Parent has the first key of its child
>   Unlike previous one, this is just an extra check, no extra protection.
>   And root item doesn't contain the first key.
> 
> - Parent has the generation of its child tree block
>   This means, if the attacker wants to use old tree blocks (since btrfs
>   also do COW, at keeps tree blocks of previous transaction), it much
>   also revert its parent tree block/root item/superblock.
>   The chain ends at superblock, but superblock is never COWed, means
>   attacker can't easily re-create an old superblock to do such rollback
>   attack.
> 
>   Also btrfs has backup super blocks, backup still differs from the
>   primary by its bytenr. Thus attacker still needs the key to regenerate
>   a valid primary super block to rollback.
> 
>   But this still exposed a hole for attacker to utilize, the
>   usebackuproot mount option, to do such rollback attack.
> 
>   So we need to do something about it.
>>
>> [...]
>>
>>> Actually, nothing in the current design prevents the whole filesystem from being
>>> rolled back to an earlier state.  So, an attacker can actually both "change the
>>> structure of the filesystem" and "roll back to an earlier state".
>>
>> Can you give an example how an attacker could do a rollback of the whole 
>> filesystem without the key? What am I missing?
> 
> As explained, attacker needs the key to regenerate the primary
> superblock, or use the usebackuproot mount option to abuse the recovery
> oriented mount option.

After some more thought, there is a narrow window where the attacker can
reliably revert the fs to its previous transaction (but only one
transaction earilier).

If the attacker can access the underlying block disk, then it can backup
the current superblock, and replay it to the disk after exactly one
transaction being committed.

The fs will be reverted to one transaction earlier, without triggering
any hmac csum mismatch.

If the attacker tries to revert to 2 or more transactions, it's pretty
possible that the attacker will just screw up the fs, as btrfs only
keeps all the tree blocks of previous transaction for its COW.

I'm not sure how valuable such attack is, as even the attacker can
revert the status of the fs to one trans earlier, the metadata and COWed
data (default) are still all validated.

Only nodatacow data is affected.

To defend against such attack, we may need extra mount option to verify
the super generation?

Thanks,
Qu

> 
> The only attack I can thing of is re-generating the csum using
> non-authentic algorithm, mostly in user space.
> But it can be easily detected.
> 
> Thanks,
> Qu
> 
>>
>>> It very well might not be practical to provide rollback protection, and this
>>> feature would still be useful without it.  But I'm concerned that you're
>>> claiming that this feature provides rollback protection when it doesn't.
>>
>> OK.
>>
>> [...]
>>
>>> The data on disk isn't trusted.  Isn't that the whole point?  The filesystem
>>> doesn't trust it until it is MAC'ed, and to do that it needs the MAC algorithm.
>>
>> OK, will add this in the next round.
>>
>> Thanks,
>> 	Johannes
>>
>
Richard Weinberger May 5, 2020, 11:56 a.m. UTC | #16
----- Ursprüngliche Mail -----
> Von: "Johannes Thumshirn" <Johannes.Thumshirn@wdc.com>
> An: "richard" <richard@nod.at>
> CC: "Eric Biggers" <ebiggers@kernel.org>, "Johannes Thumshirn" <jth@kernel.org>, "David Sterba" <dsterba@suse.cz>,
> "linux-fsdevel" <linux-fsdevel@vger.kernel.org>, "linux-btrfs" <linux-btrfs@vger.kernel.org>, "david"
> <david@sigma-star.at>, "Sascha Hauer" <s.hauer@pengutronix.de>
> Gesendet: Dienstag, 5. Mai 2020 09:46:42
> Betreff: Re: [PATCH v2 1/2] btrfs: add authentication support

> On 04/05/2020 23:41, Richard Weinberger wrote:
>> Well, UBIFS stores auth_hash_name on disk but does not trust it.
>> It is always required to provide auth_hash_name as mount parameter.
>> At mount time it is compared to the stored name (among with other parameters)
>> to detect misconfigurations.
> 
> OK, thanks for the information.
> 
> Will do so as well in v3

With UBIFS this is now the second in-tree filesystem with authentication support.
IMHO it is worth adding a new statx flag to denote this. Just like we do already
for encrypted and verity protected files.
STATX_ATTR_AUTHED?

Especially for BTRFS user this is valubale information since BTRFS authentication
is incompatible with nodatacow'ed files/dirs/subvolumes. And it might be not obvious which files are
protected and which are not. 

Thanks,
//richard
Jeff Mahoney May 5, 2020, 12:36 p.m. UTC | #17
On 5/5/20 3:55 AM, Johannes Thumshirn wrote:
> On 04/05/2020 23:59, Richard Weinberger wrote:
>> Eric already raised doubts, let me ask more directly.
>> Does the checksum tree really cover all moving parts of BTRFS?
>>
>> I'm a little surprised how small your patch is.
>> Getting all this done for UBIFS was not easy and given that UBIFS is truly
>> copy-on-write it was still less work than it would be for other filesystems.
>>
>> If I understand the checksum tree correctly, the main purpose is protecting
>> you from flipping bits.
>> An attacker will perform much more sophisticated attacks.
> 
> [ Adding Jeff with whom I did the design work ]
> 
> The checksum tree only covers the file-system payload. But combined with 
> the checksum field, which is the start of every on-disk structure, we 
> have all parts of the filesystem checksummed.

That the checksums were originally intended for bitflip protection isn't
really relevant.  Using a different algorithm doesn't change the
fundamentals and the disk format was designed to use larger checksums
than crc32c.  The checksum tree covers file data.  The contextual
information is in the metadata describing the disk blocks and all the
metadata blocks have internal checksums that would also be
authenticated.  The only weak spot is that there has been a historical
race where a user submits a write using direct i/o and modifies the data
while in flight.  This will cause CRC failures already and that would
still happen with this.

All that said, the biggest weak spot I see in the design was mentioned
on LWN: We require the key to mount the file system at all and there's
no way to have a read-only but still verifiable file system.  That's
worth examining further.

-Jeff
Qu Wenruo May 5, 2020, 12:39 p.m. UTC | #18
On 2020/5/5 下午8:36, Jeff Mahoney wrote:
> On 5/5/20 3:55 AM, Johannes Thumshirn wrote:
>> On 04/05/2020 23:59, Richard Weinberger wrote:
>>> Eric already raised doubts, let me ask more directly.
>>> Does the checksum tree really cover all moving parts of BTRFS?
>>>
>>> I'm a little surprised how small your patch is.
>>> Getting all this done for UBIFS was not easy and given that UBIFS is truly
>>> copy-on-write it was still less work than it would be for other filesystems.
>>>
>>> If I understand the checksum tree correctly, the main purpose is protecting
>>> you from flipping bits.
>>> An attacker will perform much more sophisticated attacks.
>>
>> [ Adding Jeff with whom I did the design work ]
>>
>> The checksum tree only covers the file-system payload. But combined with 
>> the checksum field, which is the start of every on-disk structure, we 
>> have all parts of the filesystem checksummed.
> 
> That the checksums were originally intended for bitflip protection isn't
> really relevant.  Using a different algorithm doesn't change the
> fundamentals and the disk format was designed to use larger checksums
> than crc32c.  The checksum tree covers file data.  The contextual
> information is in the metadata describing the disk blocks and all the
> metadata blocks have internal checksums that would also be
> authenticated.  The only weak spot is that there has been a historical
> race where a user submits a write using direct i/o and modifies the data
> while in flight.  This will cause CRC failures already and that would
> still happen with this.
> 
> All that said, the biggest weak spot I see in the design was mentioned
> on LWN: We require the key to mount the file system at all and there's
> no way to have a read-only but still verifiable file system.  That's
> worth examining further.

That can be done easily, with something like ignore_auth mount option to
completely skip hmac csum check (of course, need full RO mount, no log
replay, no way to remount rw), completely rely on bytenr/gen/first_key
and tree-checker to verify the fs.

Thanks,
Qu

> 
> -Jeff
>
Jeff Mahoney May 5, 2020, 12:41 p.m. UTC | #19
On 5/5/20 8:39 AM, Qu Wenruo wrote:
> 
> 
> On 2020/5/5 下午8:36, Jeff Mahoney wrote:
>> On 5/5/20 3:55 AM, Johannes Thumshirn wrote:
>>> On 04/05/2020 23:59, Richard Weinberger wrote:
>>>> Eric already raised doubts, let me ask more directly.
>>>> Does the checksum tree really cover all moving parts of BTRFS?
>>>>
>>>> I'm a little surprised how small your patch is.
>>>> Getting all this done for UBIFS was not easy and given that UBIFS is truly
>>>> copy-on-write it was still less work than it would be for other filesystems.
>>>>
>>>> If I understand the checksum tree correctly, the main purpose is protecting
>>>> you from flipping bits.
>>>> An attacker will perform much more sophisticated attacks.
>>>
>>> [ Adding Jeff with whom I did the design work ]
>>>
>>> The checksum tree only covers the file-system payload. But combined with 
>>> the checksum field, which is the start of every on-disk structure, we 
>>> have all parts of the filesystem checksummed.
>>
>> That the checksums were originally intended for bitflip protection isn't
>> really relevant.  Using a different algorithm doesn't change the
>> fundamentals and the disk format was designed to use larger checksums
>> than crc32c.  The checksum tree covers file data.  The contextual
>> information is in the metadata describing the disk blocks and all the
>> metadata blocks have internal checksums that would also be
>> authenticated.  The only weak spot is that there has been a historical
>> race where a user submits a write using direct i/o and modifies the data
>> while in flight.  This will cause CRC failures already and that would
>> still happen with this.
>>
>> All that said, the biggest weak spot I see in the design was mentioned
>> on LWN: We require the key to mount the file system at all and there's
>> no way to have a read-only but still verifiable file system.  That's
>> worth examining further.
> 
> That can be done easily, with something like ignore_auth mount option to
> completely skip hmac csum check (of course, need full RO mount, no log
> replay, no way to remount rw), completely rely on bytenr/gen/first_key
> and tree-checker to verify the fs.

But then you lose even bitflip protection.

-Jeff
Qu Wenruo May 5, 2020, 12:48 p.m. UTC | #20
On 2020/5/5 下午8:41, Jeff Mahoney wrote:
> On 5/5/20 8:39 AM, Qu Wenruo wrote:
>>
>>
>> On 2020/5/5 下午8:36, Jeff Mahoney wrote:
>>> On 5/5/20 3:55 AM, Johannes Thumshirn wrote:
>>>> On 04/05/2020 23:59, Richard Weinberger wrote:
>>>>> Eric already raised doubts, let me ask more directly.
>>>>> Does the checksum tree really cover all moving parts of BTRFS?
>>>>>
>>>>> I'm a little surprised how small your patch is.
>>>>> Getting all this done for UBIFS was not easy and given that UBIFS is truly
>>>>> copy-on-write it was still less work than it would be for other filesystems.
>>>>>
>>>>> If I understand the checksum tree correctly, the main purpose is protecting
>>>>> you from flipping bits.
>>>>> An attacker will perform much more sophisticated attacks.
>>>>
>>>> [ Adding Jeff with whom I did the design work ]
>>>>
>>>> The checksum tree only covers the file-system payload. But combined with 
>>>> the checksum field, which is the start of every on-disk structure, we 
>>>> have all parts of the filesystem checksummed.
>>>
>>> That the checksums were originally intended for bitflip protection isn't
>>> really relevant.  Using a different algorithm doesn't change the
>>> fundamentals and the disk format was designed to use larger checksums
>>> than crc32c.  The checksum tree covers file data.  The contextual
>>> information is in the metadata describing the disk blocks and all the
>>> metadata blocks have internal checksums that would also be
>>> authenticated.  The only weak spot is that there has been a historical
>>> race where a user submits a write using direct i/o and modifies the data
>>> while in flight.  This will cause CRC failures already and that would
>>> still happen with this.
>>>
>>> All that said, the biggest weak spot I see in the design was mentioned
>>> on LWN: We require the key to mount the file system at all and there's
>>> no way to have a read-only but still verifiable file system.  That's
>>> worth examining further.
>>
>> That can be done easily, with something like ignore_auth mount option to
>> completely skip hmac csum check (of course, need full RO mount, no log
>> replay, no way to remount rw), completely rely on bytenr/gen/first_key
>> and tree-checker to verify the fs.
> 
> But then you lose even bitflip protection.

That's why we have tree-checker for metadata.

Most detected bitflips look like from readtime tree-checker, as most of
them are bit flip in memory.
It looks like bitflip in block device is less common, as most physical
block devices have internal checksum. Bitflip there tends to cause EIO
other than bad data.

For data part, I have to admit that we lose the check completely, but
read-only mount is still better than unable to mount at all.

However such ignore_auth may need extra attention on the device assembly
part, as it can be another attacking vector (e.g. create extra device
with higher generation to override the genuine device), so it will not
be that easy as I thought.

Thanks,
Qu

> 
> -Jeff
>
David Sterba May 5, 2020, 10:14 p.m. UTC | #21
On Mon, May 04, 2020 at 01:59:35PM -0700, Eric Biggers wrote:
> On Mon, May 04, 2020 at 10:09:44AM +0000, Johannes Thumshirn wrote:
> > On 01/05/2020 07:39, Eric Biggers wrote:
> > > The hash algorithm needs to be passed as a mount option.  Otherwise the attacker
> > > gets to choose it for you among all the supported keyed hash algorithms, as soon
> > > as support for a second one is added.  Maybe use 'auth_hash_name' like UBIFS
> > > does?
> > 
> > Can you elaborate a bit more on that? As far as I know, UBIFS doesn't 
> > save the 'auth_hash_name' on disk, whereas 'BTRFS_CSUM_TYPE_HMAC_SHA256' 
> > is part of the on-disk format. As soon as we add a 2nd keyed hash, say 
> > BTRFS_CSUM_TYPE_BLAKE2B_KEYED, this will be in the superblock as well, 
> > as struct btrfs_super_block::csum_type.
> 
> The data on disk isn't trusted.  Isn't that the whole point?  The filesystem
> doesn't trust it until it is MAC'ed, and to do that it needs the MAC algorithm.

Once the auth key and filesystem is set up, that's true. The special
case is the superblock itself. It's a chicken-egg problem: we cannot
trust the superblock data until we verify the checksum, but what
checksum should be used is stored in the superblock itself.

This can be solved by requesting the checksum type externally, like the
mount option, but for the simple checksums it puts the burden on the
user and basically requires the mkfs-time settings to be permanently
used for mounting. I do not consider that a good usability.

Without the mount option, the approach we use right now is to use the
checksum type stored in the untrusted superblock, verify it and if it
matches, claim that everything is ok. The plain checksum can be
obviously subverted, just set it to something else nad recalculate the
checksum.

But then everything else will fail because the subverted checksum type
will fail on each metadata block, which immediatelly hits the 1st class
btree roots pointed to by the super block.

The same can be done with all metadata blocks, still assuming a simple
checksum.

Using that example, the authenticated checksum cannot be subverted on
the superblock. So even if there are untrusted superblock data used, it
won't even pass the verification of the superblock itself.
David Sterba May 5, 2020, 10:19 p.m. UTC | #22
On Tue, May 05, 2020 at 08:11:56AM +0000, Johannes Thumshirn wrote:
> On 04/05/2020 22:59, Eric Biggers wrote:
> [...]
> 
> > But your proposed design doesn't do this completely, since some times of offline
> > modifications are still possible.
> > 
> > So that's why I'm asking *exactly* what security properties it will provide.
> 
> [...]
> 
> > Does this mean that a parent node's checksum doesn't cover the checksum of its
> > child nodes, but rather only their locations?  Doesn't that allow subtrees to be
> > swapped around without being detected?
> 
> I was about to say "no you can't swap the subtrees as the header also 
> stores the address of the block", but please give me some more time to 
> think about it. I don't want to give a wrong answer.

Note that block addresses are of two types, the physical and logical.
The metadata blocks use the logical one, so the block can be moved to
another location still maintaining the authenticated checksum, but then
the physical address will not match. And the physical<->logical mapping
is stored as metadata item, thus in the metadata blocks protected by the
authenticated checksum.
Eric Biggers May 5, 2020, 10:31 p.m. UTC | #23
On Wed, May 06, 2020 at 12:14:48AM +0200, David Sterba wrote:
> On Mon, May 04, 2020 at 01:59:35PM -0700, Eric Biggers wrote:
> > On Mon, May 04, 2020 at 10:09:44AM +0000, Johannes Thumshirn wrote:
> > > On 01/05/2020 07:39, Eric Biggers wrote:
> > > > The hash algorithm needs to be passed as a mount option.  Otherwise the attacker
> > > > gets to choose it for you among all the supported keyed hash algorithms, as soon
> > > > as support for a second one is added.  Maybe use 'auth_hash_name' like UBIFS
> > > > does?
> > > 
> > > Can you elaborate a bit more on that? As far as I know, UBIFS doesn't 
> > > save the 'auth_hash_name' on disk, whereas 'BTRFS_CSUM_TYPE_HMAC_SHA256' 
> > > is part of the on-disk format. As soon as we add a 2nd keyed hash, say 
> > > BTRFS_CSUM_TYPE_BLAKE2B_KEYED, this will be in the superblock as well, 
> > > as struct btrfs_super_block::csum_type.
> > 
> > The data on disk isn't trusted.  Isn't that the whole point?  The filesystem
> > doesn't trust it until it is MAC'ed, and to do that it needs the MAC algorithm.
> 
> Once the auth key and filesystem is set up, that's true. The special
> case is the superblock itself. It's a chicken-egg problem: we cannot
> trust the superblock data until we verify the checksum, but what
> checksum should be used is stored in the superblock itself.
> 
> This can be solved by requesting the checksum type externally, like the
> mount option, but for the simple checksums it puts the burden on the
> user and basically requires the mkfs-time settings to be permanently
> used for mounting. I do not consider that a good usability.
> 
> Without the mount option, the approach we use right now is to use the
> checksum type stored in the untrusted superblock, verify it and if it
> matches, claim that everything is ok. The plain checksum can be
> obviously subverted, just set it to something else nad recalculate the
> checksum.
> 
> But then everything else will fail because the subverted checksum type
> will fail on each metadata block, which immediatelly hits the 1st class
> btree roots pointed to by the super block.
> 
> The same can be done with all metadata blocks, still assuming a simple
> checksum.
> 
> Using that example, the authenticated checksum cannot be subverted on
> the superblock. So even if there are untrusted superblock data used, it
> won't even pass the verification of the superblock itself.

You're missing the point.  For unkeyed hashes, there's no need to provide the
hash algorithm name at mount time, as there's no authentication anyway.  But for
keyed hashes (as added by this patch) it is needed.  If the attacker gets to
choose the algorithms for you, you don't have a valid cryptosystem.

- Eric
David Sterba May 5, 2020, 10:32 p.m. UTC | #24
On Tue, May 05, 2020 at 05:59:14PM +0800, Qu Wenruo wrote:
> After some more thought, there is a narrow window where the attacker can
> reliably revert the fs to its previous transaction (but only one
> transaction earilier).
> 
> If the attacker can access the underlying block disk, then it can backup
> the current superblock, and replay it to the disk after exactly one
> transaction being committed.
> 
> The fs will be reverted to one transaction earlier, without triggering
> any hmac csum mismatch.
> 
> If the attacker tries to revert to 2 or more transactions, it's pretty
> possible that the attacker will just screw up the fs, as btrfs only
> keeps all the tree blocks of previous transaction for its COW.
> 
> I'm not sure how valuable such attack is, as even the attacker can
> revert the status of the fs to one trans earlier, the metadata and COWed
> data (default) are still all validated.
> 
> Only nodatacow data is affected.

I agree with the above, this looks like the only relialbe attack that
can safely switch to previous transaction. This is effectively the
consistency model of btrfs, to have the current and new transaction
epoch, where the transition is done atomic overwrite of the superblock.

And exactly at this moment the old copy of superblock can be overwritten
back, as if the system crashed just before writing the new one.

From now on With each data/metadata update, new metadata blocks are
cowed and allocated and may start to overwrite the metadata from the
previous transaction. So the reliability of an undetected and unnoticed
revert to previous transaction is decreasing.

And this is on a live filesystem, the attacker would have to somehow
prevent new writes, then rewrite superblock and force new mount.

> To defend against such attack, we may need extra mount option to verify
> the super generation?

I probably don't understand what you mean here, like keeping the last
committed generation somewhere else and then have the user pass it to
mount?
David Sterba May 5, 2020, 10:33 p.m. UTC | #25
On Mon, May 04, 2020 at 08:38:36AM +0000, Johannes Thumshirn wrote:
> On 01/05/2020 08:30, Eric Biggers wrote:
> > btrfs also has an inode flag BTRFS_INODE_NODATASUM, which looks scary as it
> > results in the file being unauthenticated.  Presumably the authentication of the
> > filesystem metadata is supposed to prevent this flag from being maliciously
> > cleared?  It might be a good idea to forbid this flag if the filesystem is using
> > the authentication feature.
> 
> Yes indeed, authentication and nodatasum must be mutually exclusive.

Which also means that nodatacow can't be used as it implies nodatasum.
Eric Biggers May 5, 2020, 10:37 p.m. UTC | #26
On Tue, May 05, 2020 at 08:11:56AM +0000, Johannes Thumshirn wrote:
> On 04/05/2020 22:59, Eric Biggers wrote:
> [...]
> 
> > But your proposed design doesn't do this completely, since some times of offline
> > modifications are still possible.
> > 
> > So that's why I'm asking *exactly* what security properties it will provide.
> 
> [...]
> 
> > Does this mean that a parent node's checksum doesn't cover the checksum of its
> > child nodes, but rather only their locations?  Doesn't that allow subtrees to be
> > swapped around without being detected?
> 
> I was about to say "no you can't swap the subtrees as the header also 
> stores the address of the block", but please give me some more time to 
> think about it. I don't want to give a wrong answer.
> 
> [...]
> 
> > Actually, nothing in the current design prevents the whole filesystem from being
> > rolled back to an earlier state.  So, an attacker can actually both "change the
> > structure of the filesystem" and "roll back to an earlier state".
> 
> Can you give an example how an attacker could do a rollback of the whole 
> filesystem without the key? What am I missing?
> 

They replace the current content of the block device with the content at an
earlier time.

- Eric
David Sterba May 5, 2020, 10:46 p.m. UTC | #27
On Tue, May 05, 2020 at 03:31:20PM -0700, Eric Biggers wrote:
> > Using that example, the authenticated checksum cannot be subverted on
> > the superblock. So even if there are untrusted superblock data used, it
> > won't even pass the verification of the superblock itself.
> 
> You're missing the point.  For unkeyed hashes, there's no need to provide the
> hash algorithm name at mount time, as there's no authentication anyway.  But for
> keyed hashes (as added by this patch) it is needed.  If the attacker gets to
> choose the algorithms for you, you don't have a valid cryptosystem.

I think we need to be more specific as I don't see how this contradicts
what I've said, perhaps you'll show me the exact point where I missed
it.

An example superblock contains:

	u8 checksum[32];
	int hash_type;
	u8 the_rest[256];

The checksum is calculated from offsetof(hash_type) to the end of the
structure. Then it is stored to the checksum array, and whole block is
stored on disk.

Valid superblock created by user contains may look like:

	.checksum = 0x123456
	.hash_type = 0x1	/* hmac(sha256) */
	.the_rest = ...;

Without a valid key, none of the hash_type or the_rest can be changed
without producing a valid checksum.

When you say 'if attacker gets to chose the algorithms' I understand it
as change to hash_type, eg. setting it to 0x2 which would be
hmac(blake2b).

So maybe it violates some principle of not letting the attacker choice
happen at all, but how would the attack continue to produce a valid
checksum?
David Sterba May 5, 2020, 11 p.m. UTC | #28
On Mon, May 04, 2020 at 11:59:16PM +0200, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
> > Von: "Johannes Thumshirn" <jth@kernel.org>
> > An: "David Sterba" <dsterba@suse.cz>
> > CC: "linux-fsdevel" <linux-fsdevel@vger.kernel.org>, "linux-btrfs" <linux-btrfs@vger.kernel.org>, "Eric Biggers"
> > <ebiggers@google.com>, "richard" <richard@nod.at>, "Johannes Thumshirn" <johannes.thumshirn@wdc.com>, "Johannes
> > Thumshirn" <jthumshirn@suse.de>
> > Gesendet: Dienstag, 28. April 2020 12:58:58
> > Betreff: [PATCH v2 1/2] btrfs: add authentication support
> 
> > From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> > 
> > Add authentication support for a BTRFS file-system.
> > 
> > This works, because in BTRFS every meta-data block as well as every
> > data-block has a own checksum. For meta-data the checksum is in the
> > meta-data node itself. For data blocks, the checksums are stored in the
> > checksum tree.
> 
> Eric already raised doubts, let me ask more directly.
> Does the checksum tree really cover all moving parts of BTRFS?
> 
> I'm a little surprised how small your patch is.
> Getting all this done for UBIFS was not easy and given that UBIFS is truly
> copy-on-write it was still less work than it would be for other filesystems.

The patch is small because the amount if cross-referencing between the
structures and "noise" in the structures is assumed to be sufficient so
just the calculation of the new checksum needs to be added.

Using 'assumed' must naturally raise eyebrows, what we all want is a
proof that it is so, and I believe this is the core of the work here but
it's missing so we unfortunatelly have to take the rounds in this thread
and actually dig out the details. The hmac support won't be merged
without making things clear and documented.
David Sterba May 5, 2020, 11:02 p.m. UTC | #29
On Tue, May 05, 2020 at 08:39:21PM +0800, Qu Wenruo wrote:
> > That the checksums were originally intended for bitflip protection isn't
> > really relevant.  Using a different algorithm doesn't change the
> > fundamentals and the disk format was designed to use larger checksums
> > than crc32c.  The checksum tree covers file data.  The contextual
> > information is in the metadata describing the disk blocks and all the
> > metadata blocks have internal checksums that would also be
> > authenticated.  The only weak spot is that there has been a historical
> > race where a user submits a write using direct i/o and modifies the data
> > while in flight.  This will cause CRC failures already and that would
> > still happen with this.
> > 
> > All that said, the biggest weak spot I see in the design was mentioned
> > on LWN: We require the key to mount the file system at all and there's
> > no way to have a read-only but still verifiable file system.  That's
> > worth examining further.
> 
> That can be done easily, with something like ignore_auth mount option to
> completely skip hmac csum check (of course, need full RO mount, no log
> replay, no way to remount rw), completely rely on bytenr/gen/first_key
> and tree-checker to verify the fs.

Technical note: no unnecessary mount options, reuse the auth_key with
some special value.
Eric Biggers May 5, 2020, 11:31 p.m. UTC | #30
On Wed, May 06, 2020 at 12:46:11AM +0200, David Sterba wrote:
> On Tue, May 05, 2020 at 03:31:20PM -0700, Eric Biggers wrote:
> > > Using that example, the authenticated checksum cannot be subverted on
> > > the superblock. So even if there are untrusted superblock data used, it
> > > won't even pass the verification of the superblock itself.
> > 
> > You're missing the point.  For unkeyed hashes, there's no need to provide the
> > hash algorithm name at mount time, as there's no authentication anyway.  But for
> > keyed hashes (as added by this patch) it is needed.  If the attacker gets to
> > choose the algorithms for you, you don't have a valid cryptosystem.
> 
> I think we need to be more specific as I don't see how this contradicts
> what I've said, perhaps you'll show me the exact point where I missed
> it.
> 
> An example superblock contains:
> 
> 	u8 checksum[32];
> 	int hash_type;
> 	u8 the_rest[256];
> 
> The checksum is calculated from offsetof(hash_type) to the end of the
> structure. Then it is stored to the checksum array, and whole block is
> stored on disk.
> 
> Valid superblock created by user contains may look like:
> 
> 	.checksum = 0x123456
> 	.hash_type = 0x1	/* hmac(sha256) */
> 	.the_rest = ...;
> 
> Without a valid key, none of the hash_type or the_rest can be changed
> without producing a valid checksum.
> 
> When you say 'if attacker gets to chose the algorithms' I understand it
> as change to hash_type, eg. setting it to 0x2 which would be
> hmac(blake2b).
> 
> So maybe it violates some principle of not letting the attacker choice
> happen at all, but how would the attack continue to produce a valid
> checksum?

Example: you add support for keyed hash algorithm X, and it later turns out that
X is totally broken (or was never meant to be a cryptographic hash in the first
place, but was mistakenly allowed for authentication).  You deprecate it and
tell people not to use it.  But it doesn't matter because you screwed up the
design and the attacker can still choose algorithm X anyway.

This is a basic cryptographic principle, I'm surprised this isn't obvious.

- Eric
Qu Wenruo May 5, 2020, 11:55 p.m. UTC | #31
On 2020/5/6 上午6:32, David Sterba wrote:
> On Tue, May 05, 2020 at 05:59:14PM +0800, Qu Wenruo wrote:
>> After some more thought, there is a narrow window where the attacker can
>> reliably revert the fs to its previous transaction (but only one
>> transaction earilier).
>>
>> If the attacker can access the underlying block disk, then it can backup
>> the current superblock, and replay it to the disk after exactly one
>> transaction being committed.
>>
>> The fs will be reverted to one transaction earlier, without triggering
>> any hmac csum mismatch.
>>
>> If the attacker tries to revert to 2 or more transactions, it's pretty
>> possible that the attacker will just screw up the fs, as btrfs only
>> keeps all the tree blocks of previous transaction for its COW.
>>
>> I'm not sure how valuable such attack is, as even the attacker can
>> revert the status of the fs to one trans earlier, the metadata and COWed
>> data (default) are still all validated.
>>
>> Only nodatacow data is affected.
> 
> I agree with the above, this looks like the only relialbe attack that
> can safely switch to previous transaction. This is effectively the
> consistency model of btrfs, to have the current and new transaction
> epoch, where the transition is done atomic overwrite of the superblock.
> 
> And exactly at this moment the old copy of superblock can be overwritten
> back, as if the system crashed just before writing the new one.
> 
> From now on With each data/metadata update, new metadata blocks are
> cowed and allocated and may start to overwrite the metadata from the
> previous transaction. So the reliability of an undetected and unnoticed
> revert to previous transaction is decreasing.
> 
> And this is on a live filesystem, the attacker would have to somehow
> prevent new writes, then rewrite superblock and force new mount.
> 
>> To defend against such attack, we may need extra mount option to verify
>> the super generation?
> 
> I probably don't understand what you mean here, like keeping the last
> committed generation somewhere else and then have the user pass it to
> mount?
> 
Yes, that's my original idea.

Thanks,
Qu
David Sterba May 6, 2020, 12:29 a.m. UTC | #32
On Tue, May 05, 2020 at 04:31:10PM -0700, Eric Biggers wrote:
> On Wed, May 06, 2020 at 12:46:11AM +0200, David Sterba wrote:
> > On Tue, May 05, 2020 at 03:31:20PM -0700, Eric Biggers wrote:
> Example: you add support for keyed hash algorithm X, and it later turns out that
> X is totally broken (or was never meant to be a cryptographic hash in the first
> place, but was mistakenly allowed for authentication).  You deprecate it and
> tell people not to use it.  But it doesn't matter because you screwed up the
> design and the attacker can still choose algorithm X anyway.
> 
> This is a basic cryptographic principle, I'm surprised this isn't obvious.

I want to avoid confusion raising from too much assuming and obvious
calling, from the filesystem side and from the crypto side. I can say
that it's clear that the existing data structures provide enough
guarantees for authentication, and that it's obvious.

But I don't do that and maybe it looks dumb and uninformed but I don't
care as long as the end result is ack from a crypto-knowleagable people
that it all plays well together and there are no further doubts.

Back to the example. The problem with deprecation hasn't been brought up
so far but I probably lack imagination _how_ can an attacker choose the
algorithm in the context of the filesystem. That this is easy in
scenarios with some kind of handshake is obvious, eg. the SSL/TLS
downgrade attacks. But I see a big difference in the persistence nature
of network connections and filesystems/storage, so the number of
opportunities to force bad algorithm is quite different. Mkfs time for
sure, and at mount it's in the example I provided in my previous email.

If some algorithm is found to be broken and unsuitable for
authentication it will be a big thing. Users will be sure told to stop
using it but the existing deployments can't be saved. The support from
mkfs can be removed, kernel will warn or refuse to mount the
filesystems, etc. but what else can be done from the design point of
view?
Eric Biggers May 6, 2020, 12:44 a.m. UTC | #33
On Wed, May 06, 2020 at 02:29:48AM +0200, David Sterba wrote:
> Back to the example. The problem with deprecation hasn't been brought up
> so far but I probably lack imagination _how_ can an attacker choose the
> algorithm in the context of the filesystem.

They just set the field on-disk that specifies the authentication algorithm.

> If some algorithm is found to be broken and unsuitable for
> authentication it will be a big thing. Users will be sure told to stop
> using it but the existing deployments can't be saved. The support from
> mkfs can be removed, kernel will warn or refuse to mount the
> filesystems, etc. but what else can be done from the design point of
> view?

Require that the authentication algorithm be passed as a mount option, and
validate that it matches what's on-disk.

- Eric
Johannes Thumshirn May 6, 2020, 8:10 a.m. UTC | #34
On 06/05/2020 00:34, David Sterba wrote:
> On Mon, May 04, 2020 at 08:38:36AM +0000, Johannes Thumshirn wrote:
>> On 01/05/2020 08:30, Eric Biggers wrote:
>>> btrfs also has an inode flag BTRFS_INODE_NODATASUM, which looks scary as it
>>> results in the file being unauthenticated.  Presumably the authentication of the
>>> filesystem metadata is supposed to prevent this flag from being maliciously
>>> cleared?  It might be a good idea to forbid this flag if the filesystem is using
>>> the authentication feature.
>>
>> Yes indeed, authentication and nodatasum must be mutually exclusive.
> 
> Which also means that nodatacow can't be used as it implies nodatasum.
> 


Yep, did this already.
Johannes Thumshirn May 6, 2020, 8:30 a.m. UTC | #35
On 06/05/2020 00:37, Eric Biggers wrote:
> They replace the current content of the block device with the content at an
> earlier time.

We could mitigate this (at a later time probably), by writing the super 
block generation into some trusted store like a TPM.

But I think this would need general support from the kernel.

But thanks for showing me this attack, I'll document that this attack 
will still work.
Goffredo Baroncelli May 6, 2020, 8:40 p.m. UTC | #36
Hi Qu,

I will go a bit off topic, because I am interested more in the understanding of the btrees than the topic of this thread
On 5/5/20 11:26 AM, Qu Wenruo wrote:
[...]
> 
> My personal idea on this swap-tree attack is, the first key, generation,
> bytenr protection can prevent such case.
> 
> The protection chain begins from superblock, and ends at the leaf tree
> blocks, as long as superblock is also protected by hmac hash, it should
> be safe.
> 
> 
> Btrfs protects parent-child relationship by:
> - Parent has the pointer (bytenr) of its child
>    The main protection. If attacker wants to swap one tree block, it must
>    change the parent tree block.
>    The parent is either a tree block (parent node), or root item in root
>    tree, or a super block.
>    All protected by hmac csum. Thus attack can only do such attach by
>    knowing the key.
> 
> - Parent has the first key of its child
>    Unlike previous one, this is just an extra check, no extra protection.
>    And root item doesn't contain the first key.

It always true ? When a key is inserted, we update the key of the parent to be equal to the first of the (right) child. However when a key is removed, this should be not mandatory. Is it enough that the parent key is greater (or equal) than the first key of the left node, and lesser than the last of the right node ?

Supposing to have

              10
            /    \
1 2 3 4 5         10 11 12 13

If you remove 10 in the right child node, is it mandatory to updated the '10' in the parent node (to 11) ?


[...]
Goffredo Baroncelli May 6, 2020, 8:59 p.m. UTC | #37
On 5/5/20 11:43 AM, Qu Wenruo wrote:
> 
> 
> On 2020/4/28 下午6:58, Johannes Thumshirn wrote:
>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> Add authentication support for a BTRFS file-system.
>>
>> This works, because in BTRFS every meta-data block as well as every
>> data-block has a own checksum. For meta-data the checksum is in the
>> meta-data node itself. For data blocks, the checksums are stored in the
>> checksum tree.
>>
>> When replacing the checksum algorithm with a keyed hash, like HMAC(SHA256),
>> a key is needed to mount a verified file-system. This key also needs to be
>> used at file-system creation time.
>>
>> We have to used a keyed hash scheme, in contrast to doing a normal
>> cryptographic hash, to guarantee integrity of the file system, as a
>> potential attacker could just replay file-system operations and the
>> changes would go unnoticed.
>>
>> Having a keyed hash only on the topmost Node of a tree or even just in the
>> super-block and using cryptographic hashes on the normal meta-data nodes
>> and checksum tree entries doesn't work either, as the BTRFS B-Tree's Nodes
>> do not include the checksums of their respective child nodes, but only the
>> block pointers and offsets where to find them on disk.
>>
>> Also note, we do not need a incompat R/O flag for this, because if an old
>> kernel tries to mount an authenticated file-system it will fail the
>> initial checksum type verification and thus refuses to mount.
>>
>> The key has to be supplied by the kernel's keyring and the method of
>> getting the key securely into the kernel is not subject of this patch.
>>
>> Example usage:
>> Create a file-system with authentication key 0123456
>> mkfs.btrfs --csum hmac-sha256 --auth-key 0123456 /dev/disk
>>
>> Add the key to the kernel's keyring as keyid 'btrfs:foo'
>> keyctl add logon btrfs:foo 0123456 @u
>>
>> Mount the fs using the 'btrfs:foo' key
>> mount -t btrfs -o auth_key=btrfs:foo /dev/disk /mnt/point
>>
>> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> 
> Looks pretty straight forward, and has the basic protection against
> re-writing all csum attack.
> 
> So looks good to me.
> 
> But still I have one question around the device scan part.
> 
> Since now superblock can only be read after verified the csum, it means
> without auth_key mount option, device scan won't even work properly.

It is really needed to perform the checksum at "scan" time ? It should be possible to defer the cksum at the mount time.
The device scan is used only to track the disks for building a filesystem. We could check only the "magic", the UUID and the FSID.
The check of the cheksum may be performed at mount time.
> 
> Do you assume that all such hmac protected multi-device btrfs must be
> mounted using device= mount option along with auth_key?
> If so, a lot of users won't be that happy afaik.
> 
> Thanks,
> Qu
> 
>> ---
>>   fs/btrfs/ctree.c                |  3 ++-
>>   fs/btrfs/ctree.h                |  2 ++
>>   fs/btrfs/disk-io.c              | 53 ++++++++++++++++++++++++++++++++++++++++-
>>   fs/btrfs/super.c                | 24 ++++++++++++++++---
>>   include/uapi/linux/btrfs_tree.h |  1 +
>>   5 files changed, 78 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 6c28efe5b14a..76418b5b00a6 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -31,7 +31,7 @@ static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
>>   
>>   static const struct btrfs_csums {
>>   	u16		size;
>> -	const char	name[10];
>> +	const char	name[12];
>>   	const char	driver[12];
>>   } btrfs_csums[] = {
>>   	[BTRFS_CSUM_TYPE_CRC32] = { .size = 4, .name = "crc32c" },
>> @@ -39,6 +39,7 @@ static const struct btrfs_csums {
>>   	[BTRFS_CSUM_TYPE_SHA256] = { .size = 32, .name = "sha256" },
>>   	[BTRFS_CSUM_TYPE_BLAKE2] = { .size = 32, .name = "blake2b",
>>   				     .driver = "blake2b-256" },
>> +	[BTRFS_CSUM_TYPE_HMAC_SHA256] = { .size = 32, .name = "hmac(sha256)" }
>>   };
>>   
>>   int btrfs_super_csum_size(const struct btrfs_super_block *s)
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index c79e0b0eac54..b692b3dc4593 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -719,6 +719,7 @@ struct btrfs_fs_info {
>>   	struct rb_root swapfile_pins;
>>   
>>   	struct crypto_shash *csum_shash;
>> +	char *auth_key_name;
>>   
>>   	/*
>>   	 * Number of send operations in progress.
>> @@ -1027,6 +1028,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
>>   #define BTRFS_MOUNT_NOLOGREPLAY		(1 << 27)
>>   #define BTRFS_MOUNT_REF_VERIFY		(1 << 28)
>>   #define BTRFS_MOUNT_DISCARD_ASYNC	(1 << 29)
>> +#define BTRFS_MOUNT_AUTH_KEY		(1 << 30)
>>   
>>   #define BTRFS_DEFAULT_COMMIT_INTERVAL	(30)
>>   #define BTRFS_DEFAULT_MAX_INLINE	(2048)
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index d10c7be10f3b..fe403fb62178 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -17,6 +17,7 @@
>>   #include <linux/error-injection.h>
>>   #include <linux/crc32c.h>
>>   #include <linux/sched/mm.h>
>> +#include <keys/user-type.h>
>>   #include <asm/unaligned.h>
>>   #include <crypto/hash.h>
>>   #include "ctree.h"
>> @@ -339,6 +340,7 @@ static bool btrfs_supported_super_csum(u16 csum_type)
>>   	case BTRFS_CSUM_TYPE_XXHASH:
>>   	case BTRFS_CSUM_TYPE_SHA256:
>>   	case BTRFS_CSUM_TYPE_BLAKE2:
>> +	case BTRFS_CSUM_TYPE_HMAC_SHA256:
>>   		return true;
>>   	default:
>>   		return false;
>> @@ -2187,6 +2189,9 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
>>   {
>>   	struct crypto_shash *csum_shash;
>>   	const char *csum_driver = btrfs_super_csum_driver(csum_type);
>> +	struct key *key;
>> +	const struct user_key_payload *ukp;
>> +	int err = 0;
>>   
>>   	csum_shash = crypto_alloc_shash(csum_driver, 0, 0);
>>   
>> @@ -2198,7 +2203,53 @@ static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
>>   
>>   	fs_info->csum_shash = csum_shash;
>>   
>> -	return 0;
>> +	/*
>> +	 * if we're not doing authentication, we're done by now. Still we have
>> +	 * to validate the possible combinations of BTRFS_MOUNT_AUTH_KEY and
>> +	 * keyed hashes.
>> +	 */
>> +	if (csum_type == BTRFS_CSUM_TYPE_HMAC_SHA256 &&
>> +	    !btrfs_test_opt(fs_info, AUTH_KEY)) {
>> +		crypto_free_shash(fs_info->csum_shash);
>> +		return -EINVAL;
>> +	} else if (btrfs_test_opt(fs_info, AUTH_KEY)
>> +		   && csum_type != BTRFS_CSUM_TYPE_HMAC_SHA256) {
>> +		crypto_free_shash(fs_info->csum_shash);
>> +		return -EINVAL;
>> +	} else if (!btrfs_test_opt(fs_info, AUTH_KEY)) {
>> +		/*
>> +		 * This is the normal case, if noone want's authentication and
>> +		 * doesn't have a keyed hash, we're done.
>> +		 */
>> +		return 0;
>> +	}
>> +
>> +	key = request_key(&key_type_logon, fs_info->auth_key_name, NULL);
>> +	if (IS_ERR(key))
>> +		return PTR_ERR(key);
>> +
>> +	down_read(&key->sem);
>> +
>> +	ukp = user_key_payload_locked(key);
>> +	if (!ukp) {
>> +		btrfs_err(fs_info, "");
>> +		err = -EKEYREVOKED;
>> +		goto out;
>> +	}
>> +
>> +	err = crypto_shash_setkey(fs_info->csum_shash, ukp->data, ukp->datalen);
>> +	if (err)
>> +		btrfs_err(fs_info, "error setting key %s for verification",
>> +			  fs_info->auth_key_name);
>> +
>> +out:
>> +	if (err)
>> +		crypto_free_shash(fs_info->csum_shash);
>> +
>> +	up_read(&key->sem);
>> +	key_put(key);
>> +
>> +	return err;
>>   }
>>   
>>   static int btrfs_replay_log(struct btrfs_fs_info *fs_info,
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 7932d8d07cff..2645a9cee8d1 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -333,6 +333,7 @@ enum {
>>   	Opt_treelog, Opt_notreelog,
>>   	Opt_usebackuproot,
>>   	Opt_user_subvol_rm_allowed,
>> +	Opt_auth_key,
>>   
>>   	/* Deprecated options */
>>   	Opt_alloc_start,
>> @@ -401,6 +402,7 @@ static const match_table_t tokens = {
>>   	{Opt_notreelog, "notreelog"},
>>   	{Opt_usebackuproot, "usebackuproot"},
>>   	{Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"},
>> +	{Opt_auth_key, "auth_key=%s"},
>>   
>>   	/* Deprecated options */
>>   	{Opt_alloc_start, "alloc_start=%s"},
>> @@ -910,7 +912,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
>>    * All other options will be parsed on much later in the mount process and
>>    * only when we need to allocate a new super block.
>>    */
>> -static int btrfs_parse_device_options(const char *options, fmode_t flags,
>> +static int btrfs_parse_device_options(struct btrfs_fs_info *info,
>> +				      const char *options, fmode_t flags,
>>   				      void *holder)
>>   {
>>   	substring_t args[MAX_OPT_ARGS];
>> @@ -939,7 +942,8 @@ static int btrfs_parse_device_options(const char *options, fmode_t flags,
>>   			continue;
>>   
>>   		token = match_token(p, tokens, args);
>> -		if (token == Opt_device) {
>> +		switch (token) {
>> +		case Opt_device:
>>   			device_name = match_strdup(&args[0]);
>>   			if (!device_name) {
>>   				error = -ENOMEM;
>> @@ -952,6 +956,18 @@ static int btrfs_parse_device_options(const char *options, fmode_t flags,
>>   				error = PTR_ERR(device);
>>   				goto out;
>>   			}
>> +			break;
>> +		case Opt_auth_key:
>> +			info->auth_key_name = match_strdup(&args[0]);
>> +			if (!info->auth_key_name) {
>> +				error = -ENOMEM;
>> +				goto out;
>> +			}
>> +			btrfs_info(info, "doing authentication");
>> +			btrfs_set_opt(info->mount_opt, AUTH_KEY);
>> +			break;
>> +		default:
>> +			break;
>>   		}
>>   	}
>>   
>> @@ -1394,6 +1410,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
>>   #endif
>>   	if (btrfs_test_opt(info, REF_VERIFY))
>>   		seq_puts(seq, ",ref_verify");
>> +	if (btrfs_test_opt(info, AUTH_KEY))
>> +		seq_printf(seq, ",auth_key=%s", info->auth_key_name);
>>   	seq_printf(seq, ",subvolid=%llu",
>>   		  BTRFS_I(d_inode(dentry))->root->root_key.objectid);
>>   	seq_puts(seq, ",subvol=");
>> @@ -1542,7 +1560,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
>>   	}
>>   
>>   	mutex_lock(&uuid_mutex);
>> -	error = btrfs_parse_device_options(data, mode, fs_type);
>> +	error = btrfs_parse_device_options(fs_info, data, mode, fs_type);
>>   	if (error) {
>>   		mutex_unlock(&uuid_mutex);
>>   		goto error_fs_info;
>> diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
>> index a02318e4d2a9..bfaf127b37fd 100644
>> --- a/include/uapi/linux/btrfs_tree.h
>> +++ b/include/uapi/linux/btrfs_tree.h
>> @@ -344,6 +344,7 @@ enum btrfs_csum_type {
>>   	BTRFS_CSUM_TYPE_XXHASH	= 1,
>>   	BTRFS_CSUM_TYPE_SHA256	= 2,
>>   	BTRFS_CSUM_TYPE_BLAKE2	= 3,
>> +	BTRFS_CSUM_TYPE_HMAC_SHA256 = 32,
>>   };
>>   
>>   /*
>>
>
Goffredo Baroncelli May 6, 2020, 9:24 p.m. UTC | #38
On 5/5/20 2:36 PM, Jeff Mahoney wrote:
> On 5/5/20 3:55 AM, Johannes Thumshirn wrote:
>> On 04/05/2020 23:59, Richard Weinberger wrote:
>>> Eric already raised doubts, let me ask more directly.
>>> Does the checksum tree really cover all moving parts of BTRFS?
>>>
>>> I'm a little surprised how small your patch is.
>>> Getting all this done for UBIFS was not easy and given that UBIFS is truly
>>> copy-on-write it was still less work than it would be for other filesystems.
>>>
>>> If I understand the checksum tree correctly, the main purpose is protecting
>>> you from flipping bits.
>>> An attacker will perform much more sophisticated attacks.
>>
>> [ Adding Jeff with whom I did the design work ]
>>
>> The checksum tree only covers the file-system payload. But combined with
>> the checksum field, which is the start of every on-disk structure, we
>> have all parts of the filesystem checksummed.
> 
> That the checksums were originally intended for bitflip protection isn't
> really relevant.  Using a different algorithm doesn't change the
> fundamentals and the disk format was designed to use larger checksums
> than crc32c.  The checksum tree covers file data.  The contextual
> information is in the metadata describing the disk blocks and all the
> metadata blocks have internal checksums that would also be
> authenticated.  


> The only weak spot is that there has been a historical
> race where a user submits a write using direct i/o and modifies the data
> while in flight.  This will cause CRC failures already and that would
> still happen with this.
I faced this issue few years ago.
However it would be sufficient to disable DIRECT IO for a DATASUM file.
And I think that this should be done even for a "non authenticate" filesystem.
Allow the users to use a feature that can cause a bad crc to me doesn't seems a good idea.

BTW it seems that ZFS ignore O_DIRECT

https://github.com/openzfs/zfs/issues/224


> 
> All that said, the biggest weak spot I see in the design was mentioned
> on LWN: We require the key to mount the file system at all and there's
> no way to have a read-only but still verifiable file system.  That's
> worth examining further.
> 
> -Jeff
>
Qu Wenruo May 6, 2020, 10:57 p.m. UTC | #39
On 2020/5/7 上午4:40, Goffredo Baroncelli wrote:
> Hi Qu,
> 
> I will go a bit off topic, because I am interested more in the
> understanding of the btrees than the topic of this thread

Then removing unrelated CCs to reduce the noise.

> On 5/5/20 11:26 AM, Qu Wenruo wrote:
> [...]
>>
>> My personal idea on this swap-tree attack is, the first key, generation,
>> bytenr protection can prevent such case.
>>
>> The protection chain begins from superblock, and ends at the leaf tree
>> blocks, as long as superblock is also protected by hmac hash, it should
>> be safe.
>>
>>
>> Btrfs protects parent-child relationship by:
>> - Parent has the pointer (bytenr) of its child
>>    The main protection. If attacker wants to swap one tree block, it must
>>    change the parent tree block.
>>    The parent is either a tree block (parent node), or root item in root
>>    tree, or a super block.
>>    All protected by hmac csum. Thus attack can only do such attach by
>>    knowing the key.
>>
>> - Parent has the first key of its child
>>    Unlike previous one, this is just an extra check, no extra protection.
>>    And root item doesn't contain the first key.
> 
> It always true ? When a key is inserted, we update the key of the parent
> to be equal to the first of the (right) child. However when a key is
> removed, this should be not mandatory. Is it enough that the parent key
> is greater (or equal) than the first key of the left node, and lesser
> than the last of the right node ?
> 
> Supposing to have
> 
>             1 10 (A)
>            /    \
> 1 2 3 4 5 (B)     10 11 12 13 (C)
> 
> If you remove 10 in the right child node, is it mandatory to updated the
> '10' in the parent node (to 11) ?

Yes. And we're always COW so tree block C and A will get COWed (and if A
has parents, the path towards the tree root will get COWed).

If we remove 10, then the result would be:
	1 11 (Cowed A)
       /    \
1 ~ 5 (B)    11 12 13 (Cowed C)

Thanks,
Qu

> 
> 
> [...]
>
diff mbox series

Patch

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 6c28efe5b14a..76418b5b00a6 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -31,7 +31,7 @@  static void del_ptr(struct btrfs_root *root, struct btrfs_path *path,
 
 static const struct btrfs_csums {
 	u16		size;
-	const char	name[10];
+	const char	name[12];
 	const char	driver[12];
 } btrfs_csums[] = {
 	[BTRFS_CSUM_TYPE_CRC32] = { .size = 4, .name = "crc32c" },
@@ -39,6 +39,7 @@  static const struct btrfs_csums {
 	[BTRFS_CSUM_TYPE_SHA256] = { .size = 32, .name = "sha256" },
 	[BTRFS_CSUM_TYPE_BLAKE2] = { .size = 32, .name = "blake2b",
 				     .driver = "blake2b-256" },
+	[BTRFS_CSUM_TYPE_HMAC_SHA256] = { .size = 32, .name = "hmac(sha256)" }
 };
 
 int btrfs_super_csum_size(const struct btrfs_super_block *s)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c79e0b0eac54..b692b3dc4593 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -719,6 +719,7 @@  struct btrfs_fs_info {
 	struct rb_root swapfile_pins;
 
 	struct crypto_shash *csum_shash;
+	char *auth_key_name;
 
 	/*
 	 * Number of send operations in progress.
@@ -1027,6 +1028,7 @@  static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
 #define BTRFS_MOUNT_NOLOGREPLAY		(1 << 27)
 #define BTRFS_MOUNT_REF_VERIFY		(1 << 28)
 #define BTRFS_MOUNT_DISCARD_ASYNC	(1 << 29)
+#define BTRFS_MOUNT_AUTH_KEY		(1 << 30)
 
 #define BTRFS_DEFAULT_COMMIT_INTERVAL	(30)
 #define BTRFS_DEFAULT_MAX_INLINE	(2048)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d10c7be10f3b..fe403fb62178 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -17,6 +17,7 @@ 
 #include <linux/error-injection.h>
 #include <linux/crc32c.h>
 #include <linux/sched/mm.h>
+#include <keys/user-type.h>
 #include <asm/unaligned.h>
 #include <crypto/hash.h>
 #include "ctree.h"
@@ -339,6 +340,7 @@  static bool btrfs_supported_super_csum(u16 csum_type)
 	case BTRFS_CSUM_TYPE_XXHASH:
 	case BTRFS_CSUM_TYPE_SHA256:
 	case BTRFS_CSUM_TYPE_BLAKE2:
+	case BTRFS_CSUM_TYPE_HMAC_SHA256:
 		return true;
 	default:
 		return false;
@@ -2187,6 +2189,9 @@  static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
 {
 	struct crypto_shash *csum_shash;
 	const char *csum_driver = btrfs_super_csum_driver(csum_type);
+	struct key *key;
+	const struct user_key_payload *ukp;
+	int err = 0;
 
 	csum_shash = crypto_alloc_shash(csum_driver, 0, 0);
 
@@ -2198,7 +2203,53 @@  static int btrfs_init_csum_hash(struct btrfs_fs_info *fs_info, u16 csum_type)
 
 	fs_info->csum_shash = csum_shash;
 
-	return 0;
+	/*
+	 * if we're not doing authentication, we're done by now. Still we have
+	 * to validate the possible combinations of BTRFS_MOUNT_AUTH_KEY and
+	 * keyed hashes.
+	 */
+	if (csum_type == BTRFS_CSUM_TYPE_HMAC_SHA256 &&
+	    !btrfs_test_opt(fs_info, AUTH_KEY)) {
+		crypto_free_shash(fs_info->csum_shash);
+		return -EINVAL;
+	} else if (btrfs_test_opt(fs_info, AUTH_KEY)
+		   && csum_type != BTRFS_CSUM_TYPE_HMAC_SHA256) {
+		crypto_free_shash(fs_info->csum_shash);
+		return -EINVAL;
+	} else if (!btrfs_test_opt(fs_info, AUTH_KEY)) {
+		/*
+		 * This is the normal case, if noone want's authentication and
+		 * doesn't have a keyed hash, we're done.
+		 */
+		return 0;
+	}
+
+	key = request_key(&key_type_logon, fs_info->auth_key_name, NULL);
+	if (IS_ERR(key))
+		return PTR_ERR(key);
+
+	down_read(&key->sem);
+
+	ukp = user_key_payload_locked(key);
+	if (!ukp) {
+		btrfs_err(fs_info, "");
+		err = -EKEYREVOKED;
+		goto out;
+	}
+
+	err = crypto_shash_setkey(fs_info->csum_shash, ukp->data, ukp->datalen);
+	if (err)
+		btrfs_err(fs_info, "error setting key %s for verification",
+			  fs_info->auth_key_name);
+
+out:
+	if (err)
+		crypto_free_shash(fs_info->csum_shash);
+
+	up_read(&key->sem);
+	key_put(key);
+
+	return err;
 }
 
 static int btrfs_replay_log(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7932d8d07cff..2645a9cee8d1 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -333,6 +333,7 @@  enum {
 	Opt_treelog, Opt_notreelog,
 	Opt_usebackuproot,
 	Opt_user_subvol_rm_allowed,
+	Opt_auth_key,
 
 	/* Deprecated options */
 	Opt_alloc_start,
@@ -401,6 +402,7 @@  static const match_table_t tokens = {
 	{Opt_notreelog, "notreelog"},
 	{Opt_usebackuproot, "usebackuproot"},
 	{Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"},
+	{Opt_auth_key, "auth_key=%s"},
 
 	/* Deprecated options */
 	{Opt_alloc_start, "alloc_start=%s"},
@@ -910,7 +912,8 @@  int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
  * All other options will be parsed on much later in the mount process and
  * only when we need to allocate a new super block.
  */
-static int btrfs_parse_device_options(const char *options, fmode_t flags,
+static int btrfs_parse_device_options(struct btrfs_fs_info *info,
+				      const char *options, fmode_t flags,
 				      void *holder)
 {
 	substring_t args[MAX_OPT_ARGS];
@@ -939,7 +942,8 @@  static int btrfs_parse_device_options(const char *options, fmode_t flags,
 			continue;
 
 		token = match_token(p, tokens, args);
-		if (token == Opt_device) {
+		switch (token) {
+		case Opt_device:
 			device_name = match_strdup(&args[0]);
 			if (!device_name) {
 				error = -ENOMEM;
@@ -952,6 +956,18 @@  static int btrfs_parse_device_options(const char *options, fmode_t flags,
 				error = PTR_ERR(device);
 				goto out;
 			}
+			break;
+		case Opt_auth_key:
+			info->auth_key_name = match_strdup(&args[0]);
+			if (!info->auth_key_name) {
+				error = -ENOMEM;
+				goto out;
+			}
+			btrfs_info(info, "doing authentication");
+			btrfs_set_opt(info->mount_opt, AUTH_KEY);
+			break;
+		default:
+			break;
 		}
 	}
 
@@ -1394,6 +1410,8 @@  static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 #endif
 	if (btrfs_test_opt(info, REF_VERIFY))
 		seq_puts(seq, ",ref_verify");
+	if (btrfs_test_opt(info, AUTH_KEY))
+		seq_printf(seq, ",auth_key=%s", info->auth_key_name);
 	seq_printf(seq, ",subvolid=%llu",
 		  BTRFS_I(d_inode(dentry))->root->root_key.objectid);
 	seq_puts(seq, ",subvol=");
@@ -1542,7 +1560,7 @@  static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 	}
 
 	mutex_lock(&uuid_mutex);
-	error = btrfs_parse_device_options(data, mode, fs_type);
+	error = btrfs_parse_device_options(fs_info, data, mode, fs_type);
 	if (error) {
 		mutex_unlock(&uuid_mutex);
 		goto error_fs_info;
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index a02318e4d2a9..bfaf127b37fd 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -344,6 +344,7 @@  enum btrfs_csum_type {
 	BTRFS_CSUM_TYPE_XXHASH	= 1,
 	BTRFS_CSUM_TYPE_SHA256	= 2,
 	BTRFS_CSUM_TYPE_BLAKE2	= 3,
+	BTRFS_CSUM_TYPE_HMAC_SHA256 = 32,
 };
 
 /*