mbox series

[v3,00/13] Add support for other checksums

Message ID 20190522081910.7689-1-jthumshirn@suse.de (mailing list archive)
Headers show
Series Add support for other checksums | expand

Message

Johannes Thumshirn May 22, 2019, 8:18 a.m. UTC
This patchset add support for adding new checksum types in BTRFS.

Currently BTRFS only supports CRC32C as data and metadata checksum, which is
good if you only want to detect errors due to data corruption in hardware.

But CRC32C isn't able cover other use-cases like de-duplication or
cryptographically save data integrity guarantees.

The following properties made SHA-256 interesting for these use-cases:
- Still considered cryptographically sound
- Reasonably well understood by the security industry
- Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
  format
- Small enough collision space to make it feasible for data de-duplication
- Fast enough to calculate and offloadable to crypto hardware via the kernel's
  crypto_shash framework.

The patchset also provides mechanisms for plumbing in different hash
algorithms relatively easy.

Unfortunately this patchset also partially reverts commit: 
9678c54388b6 ("btrfs: Remove custom crc32c init code")

This is an intermediate submission, as a) mkfs.btrfs support is still missing
and b) David requested to have three hash algorithms, where 1 is crc32c, one
cryptographically secure and one in between.

A changelog can be found directly in the patches. The branch is also available
on a gitweb at
https://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git/log/?h=btrfs-csum-rework.v3

Johannes Thumshirn (13):
  btrfs: use btrfs_csum_data() instead of directly calling crc32c
  btrfs: resurrect btrfs_crc32c()
  btrfs: use btrfs_crc32c{,_final}() in for free space cache
  btrfs: don't assume ordered sums to be 4 bytes
  btrfs: dont assume compressed_bio sums to be 4 bytes
  btrfs: format checksums according to type for printing
  btrfs: add common checksum type validation
  btrfs: check for supported superblock checksum type before checksum
    validation
  btrfs: Simplify btrfs_check_super_csum() and get rid of size
    assumptions
  btrfs: add boilerplate code for directly including the crypto
    framework
  btrfs: directly call into crypto framework for checsumming
  btrfs: remove assumption about csum type form
    btrfs_print_data_csum_error()
  btrfs: add sha256 as another checksum algorithm

 fs/btrfs/Kconfig                |   4 +-
 fs/btrfs/btrfs_inode.h          |  33 +++++++--
 fs/btrfs/check-integrity.c      |  12 ++--
 fs/btrfs/compression.c          |  40 +++++++----
 fs/btrfs/compression.h          |   2 +-
 fs/btrfs/ctree.h                |  27 +++++++-
 fs/btrfs/disk-io.c              | 146 ++++++++++++++++++++++++++--------------
 fs/btrfs/disk-io.h              |   2 -
 fs/btrfs/extent-tree.c          |   6 +-
 fs/btrfs/file-item.c            |  44 +++++++-----
 fs/btrfs/free-space-cache.c     |  10 ++-
 fs/btrfs/inode.c                |  20 ++++--
 fs/btrfs/ordered-data.c         |  10 +--
 fs/btrfs/ordered-data.h         |   4 +-
 fs/btrfs/scrub.c                |  39 ++++++++---
 fs/btrfs/send.c                 |   2 +-
 fs/btrfs/super.c                |   2 +
 include/uapi/linux/btrfs_tree.h |   6 +-
 18 files changed, 280 insertions(+), 129 deletions(-)

Comments

David Sterba May 27, 2019, 5:19 p.m. UTC | #1
On Wed, May 22, 2019 at 10:18:57AM +0200, Johannes Thumshirn wrote:
> This patchset add support for adding new checksum types in BTRFS.
> 
> Currently BTRFS only supports CRC32C as data and metadata checksum, which is
> good if you only want to detect errors due to data corruption in hardware.
> 
> But CRC32C isn't able cover other use-cases like de-duplication or
> cryptographically save data integrity guarantees.
> 
> The following properties made SHA-256 interesting for these use-cases:
> - Still considered cryptographically sound
> - Reasonably well understood by the security industry
> - Result fits into the 32Byte/256Bit we have for the checksum in the on-disk
>   format
> - Small enough collision space to make it feasible for data de-duplication
> - Fast enough to calculate and offloadable to crypto hardware via the kernel's
>   crypto_shash framework.
> 
> The patchset also provides mechanisms for plumbing in different hash
> algorithms relatively easy.
> 
> Unfortunately this patchset also partially reverts commit: 
> 9678c54388b6 ("btrfs: Remove custom crc32c init code")
> 
> This is an intermediate submission, as a) mkfs.btrfs support is still missing
> and b) David requested to have three hash algorithms, where 1 is crc32c, one
> cryptographically secure and one in between.
> 
> A changelog can be found directly in the patches. The branch is also available
> on a gitweb at
> https://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git/log/?h=btrfs-csum-rework.v3
> 
> Johannes Thumshirn (13):
>   btrfs: use btrfs_csum_data() instead of directly calling crc32c
>   btrfs: resurrect btrfs_crc32c()
>   btrfs: use btrfs_crc32c{,_final}() in for free space cache
>   btrfs: don't assume ordered sums to be 4 bytes
>   btrfs: dont assume compressed_bio sums to be 4 bytes
>   btrfs: format checksums according to type for printing
>   btrfs: add common checksum type validation
>   btrfs: check for supported superblock checksum type before checksum
>     validation
>   btrfs: Simplify btrfs_check_super_csum() and get rid of size
>     assumptions
>   btrfs: add boilerplate code for directly including the crypto
>     framework
>   btrfs: directly call into crypto framework for checsumming
>   btrfs: remove assumption about csum type form
>     btrfs_print_data_csum_error()
>   btrfs: add sha256 as another checksum algorithm

1-5 are reviewed and ok, 6 and 13 should be reworked, 7-12 is ok. I
can't put the branch to next yet due to the csum formatting "issues" but
will do once you resend. Should be ok just 6 and 13 as they're
independent.
Johannes Thumshirn June 3, 2019, 9:38 a.m. UTC | #2
On Mon, May 27, 2019 at 07:19:54PM +0200, David Sterba wrote:
> 1-5 are reviewed and ok, 6 and 13 should be reworked, 7-12 is ok. I
> can't put the branch to next yet due to the csum formatting "issues" but
> will do once you resend. Should be ok just 6 and 13 as they're
> independent.

I'd still like to hold back 13/13. SHA-256 doesn't seem to be well received by
the community as the "slow" hash and using a plain SHA-256 is not sufficient
for the dm-verity/fs-verity like approach I intend to implement in subsequent
patches.

For the record, the current idea is to use a HMAC(SHA-256) as checksum
algorithm with a key provided at mkfs and mount time.
David Sterba June 3, 2019, 12:40 p.m. UTC | #3
On Mon, Jun 03, 2019 at 11:38:40AM +0200, Johannes Thumshirn wrote:
> On Mon, May 27, 2019 at 07:19:54PM +0200, David Sterba wrote:
> > 1-5 are reviewed and ok, 6 and 13 should be reworked, 7-12 is ok. I
> > can't put the branch to next yet due to the csum formatting "issues" but
> > will do once you resend. Should be ok just 6 and 13 as they're
> > independent.
> 
> I'd still like to hold back 13/13. SHA-256 doesn't seem to be well received by
> the community as the "slow" hash and using a plain SHA-256 is not sufficient
> for the dm-verity/fs-verity like approach I intend to implement in subsequent
> patches.
> 
> For the record, the current idea is to use a HMAC(SHA-256) as checksum
> algorithm with a key provided at mkfs and mount time.

The patch actually adding the new hash won't be merged to any
to-be-released branch until we have the final list, but for testing
purposes the patch will be in for-next and available via linux-next.