Message ID | cover.1621276134.git.osandov@fb.com (mailing list archive) |
---|---|
Headers | show |
Series | fs: interface for directly reading/writing compressed data | expand |
On Mon, May 17, 2021 at 11:35 AM Omar Sandoval <osandov@osandov.com> wrote: > > Patches 1-3 add the VFS support, UAPI, and documentation. Patches 4-7 > are Btrfs prep patches. Patch 8 adds Btrfs encoded read support and > patch 9 adds Btrfs encoded write support. I don't love the RWF_ENCODED flag, but if that's the way people think this should be done, as a model this looks reasonable to me. I'm not sure what the deal with the encryption metadata is. I realize there is currently only one encryption type ("none") in this series, but it's not clear how any other encryption type would actually ever be described. It's not like you can pass in the key (well, I guess passing in the key would be fine, but passing it back out certainly would not be). A key ID from a keyring? So there's presumably some future plan for it, but it would be good to verify that that plan makes sense.. Linus
On Mon, May 17, 2021 at 02:32:47PM -0700, Linus Torvalds wrote: > On Mon, May 17, 2021 at 11:35 AM Omar Sandoval <osandov@osandov.com> wrote: > > > > Patches 1-3 add the VFS support, UAPI, and documentation. Patches 4-7 > > are Btrfs prep patches. Patch 8 adds Btrfs encoded read support and > > patch 9 adds Btrfs encoded write support. > > I don't love the RWF_ENCODED flag, but if that's the way people think > this should be done, as a model this looks reasonable to me. > > I'm not sure what the deal with the encryption metadata is. I realize > there is currently only one encryption type ("none") in this series, > but it's not clear how any other encryption type would actually ever > be described. It's not like you can pass in the key (well, I guess > passing in the key would be fine, but passing it back out certainly > would not be). A key ID from a keyring? > > So there's presumably some future plan for it, but it would be good to > verify that that plan makes sense.. What I'm imagining for fscrypt is: 1. Add ENCODED_IOV_ENCRYPTION_* types for fscrypt. Consumers at least need to be able to distinguish between encryption policy versions, DIRECT_KEY policies, and IV_INO_LBLK_{64,32} policies, and maybe other details. 2. Use RWF_ENCODED only for the data itself. 3. Add new fscrypt ioctls to get and set the encryption key. The interesting part is (3). If I'm reading the fscrypt documentation correctly, in the default mode, each file is encrypted with a per-file key that is a function of the master key for the directory tree and a per-file nonce. Userspace manages the master key, we have a FS_IOC_GET_ENCRYPTION_NONCE ioctl, and the key derivation function is documented. So, userspace already has all of the pieces it needs to get the encryption key, and all of the information it needs to decrypt the data it gets from RWF_ENCODED if it so desires. On the set/write side, the user can set the same master key and policy with FS_IOC_SET_ENCRYPTION_POLICY, and we'd need something like an FS_IOC_SET_ENCRYPTION_NONCE ioctl (possibly with a requirement that it be set when the file is empty). I think that's it. The details will vary for the other fscrypt policies, but that's the gist of it. I added the fscrypt maintainers to correct me if I missed something. Thanks, Omar
On Mon, May 17, 2021 at 03:27:48PM -0700, Omar Sandoval wrote: > On Mon, May 17, 2021 at 02:32:47PM -0700, Linus Torvalds wrote: > > On Mon, May 17, 2021 at 11:35 AM Omar Sandoval <osandov@osandov.com> wrote: > > > > > > Patches 1-3 add the VFS support, UAPI, and documentation. Patches 4-7 > > > are Btrfs prep patches. Patch 8 adds Btrfs encoded read support and > > > patch 9 adds Btrfs encoded write support. > > > > I don't love the RWF_ENCODED flag, but if that's the way people think > > this should be done, as a model this looks reasonable to me. > > > > I'm not sure what the deal with the encryption metadata is. I realize > > there is currently only one encryption type ("none") in this series, > > but it's not clear how any other encryption type would actually ever > > be described. It's not like you can pass in the key (well, I guess > > passing in the key would be fine, but passing it back out certainly > > would not be). A key ID from a keyring? > > > > So there's presumably some future plan for it, but it would be good to > > verify that that plan makes sense.. > > What I'm imagining for fscrypt is: > > 1. Add ENCODED_IOV_ENCRYPTION_* types for fscrypt. Consumers at least > need to be able to distinguish between encryption policy versions, > DIRECT_KEY policies, and IV_INO_LBLK_{64,32} policies, and maybe > other details. > 2. Use RWF_ENCODED only for the data itself. > 3. Add new fscrypt ioctls to get and set the encryption key. > > The interesting part is (3). If I'm reading the fscrypt documentation > correctly, in the default mode, each file is encrypted with a per-file > key that is a function of the master key for the directory tree and a > per-file nonce. > > Userspace manages the master key, we have a FS_IOC_GET_ENCRYPTION_NONCE > ioctl, and the key derivation function is documented. So, userspace > already has all of the pieces it needs to get the encryption key, and > all of the information it needs to decrypt the data it gets from > RWF_ENCODED if it so desires. > > On the set/write side, the user can set the same master key and policy > with FS_IOC_SET_ENCRYPTION_POLICY, and we'd need something like an > FS_IOC_SET_ENCRYPTION_NONCE ioctl (possibly with a requirement that it > be set when the file is empty). I think that's it. > > The details will vary for the other fscrypt policies, but that's the > gist of it. I added the fscrypt maintainers to correct me if I missed > something. > Well, assuming we're talking about regular files only (so file contents encryption, not filenames encryption), with fscrypt the information needed to understand a file's encrypted data is the following: 1. The encryption key 2. The filesystem's block size 3. The encryption context: struct fscrypt_context_v2 { u8 version; /* FSCRYPT_CONTEXT_V2 */ u8 contents_encryption_mode; u8 filenames_encryption_mode; u8 flags; u8 __reserved[4]; u8 master_key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]; u8 nonce[FSCRYPT_FILE_NONCE_SIZE]; }; (Or alternatively struct fscrypt_policy_v2 + the nonce field separately; that results in the same fields as struct fscrypt_context_v2.) This is definitely more complex than the compression cases like "the data is a zlib stream". So the question is, how much of this metadata (if any) should actually be passed around during RWF_ENCODED pread/pwrite operations, and how much should be out-of-band. I feel like this should be mostly out-of-band (e.g. via the existing ioctls FS_IOC_{GET,SET}_ENCRYPTION_POLICY), especially given that compression and encryption could be combined which would make describing the on-disk data even more difficult. But I'm not sure what you intended. - Eric
On Mon, May 17, 2021 at 03:48:30PM -0700, Eric Biggers wrote: > On Mon, May 17, 2021 at 03:27:48PM -0700, Omar Sandoval wrote: > > On Mon, May 17, 2021 at 02:32:47PM -0700, Linus Torvalds wrote: > > > On Mon, May 17, 2021 at 11:35 AM Omar Sandoval <osandov@osandov.com> wrote: > > > > > > > > Patches 1-3 add the VFS support, UAPI, and documentation. Patches 4-7 > > > > are Btrfs prep patches. Patch 8 adds Btrfs encoded read support and > > > > patch 9 adds Btrfs encoded write support. > > > > > > I don't love the RWF_ENCODED flag, but if that's the way people think > > > this should be done, as a model this looks reasonable to me. > > > > > > I'm not sure what the deal with the encryption metadata is. I realize > > > there is currently only one encryption type ("none") in this series, > > > but it's not clear how any other encryption type would actually ever > > > be described. It's not like you can pass in the key (well, I guess > > > passing in the key would be fine, but passing it back out certainly > > > would not be). A key ID from a keyring? > > > > > > So there's presumably some future plan for it, but it would be good to > > > verify that that plan makes sense.. > > > > What I'm imagining for fscrypt is: > > > > 1. Add ENCODED_IOV_ENCRYPTION_* types for fscrypt. Consumers at least > > need to be able to distinguish between encryption policy versions, > > DIRECT_KEY policies, and IV_INO_LBLK_{64,32} policies, and maybe > > other details. > > 2. Use RWF_ENCODED only for the data itself. > > 3. Add new fscrypt ioctls to get and set the encryption key. > > > > The interesting part is (3). If I'm reading the fscrypt documentation > > correctly, in the default mode, each file is encrypted with a per-file > > key that is a function of the master key for the directory tree and a > > per-file nonce. > > > > Userspace manages the master key, we have a FS_IOC_GET_ENCRYPTION_NONCE > > ioctl, and the key derivation function is documented. So, userspace > > already has all of the pieces it needs to get the encryption key, and > > all of the information it needs to decrypt the data it gets from > > RWF_ENCODED if it so desires. > > > > On the set/write side, the user can set the same master key and policy > > with FS_IOC_SET_ENCRYPTION_POLICY, and we'd need something like an > > FS_IOC_SET_ENCRYPTION_NONCE ioctl (possibly with a requirement that it > > be set when the file is empty). I think that's it. > > > > The details will vary for the other fscrypt policies, but that's the > > gist of it. I added the fscrypt maintainers to correct me if I missed > > something. > > > > Well, assuming we're talking about regular files only (so file contents > encryption, not filenames encryption), Yes, I was thinking of regular files. File operations using encrypted names sounds... interesting, but I think out of scope for this. > with fscrypt the information needed to > understand a file's encrypted data is the following: > > 1. The encryption key > > 2. The filesystem's block size > > 3. The encryption context: > > struct fscrypt_context_v2 { > u8 version; /* FSCRYPT_CONTEXT_V2 */ > u8 contents_encryption_mode; > u8 filenames_encryption_mode; > u8 flags; > u8 __reserved[4]; > u8 master_key_identifier[FSCRYPT_KEY_IDENTIFIER_SIZE]; > u8 nonce[FSCRYPT_FILE_NONCE_SIZE]; > }; > > (Or alternatively struct fscrypt_policy_v2 + the nonce field separately; > that results in the same fields as struct fscrypt_context_v2.) > > This is definitely more complex than the compression cases like "the data is a > zlib stream". So the question is, how much of this metadata (if any) should > actually be passed around during RWF_ENCODED pread/pwrite operations, and how > much should be out-of-band. > > I feel like this should be mostly out-of-band (e.g. via the existing ioctls > FS_IOC_{GET,SET}_ENCRYPTION_POLICY), especially given that compression and > encryption could be combined which would make describing the on-disk data even > more difficult. > > But I'm not sure what you intended. Okay, I think we're in agreement: RWF_ENCODED for the data and separate ioctls for the encryption context. Since the fscrypt policy struct includes all of the relevant information, RWF_ENCODED can probably just have a single ENCODED_IOV_ENCRYPTION_FSCRYPT encryption type. RWF_ENCODED can express data which is both compressed and encrypted, so that should be fine as well. The only other missing piece that I see (other than filesystem support) is an FS_IOC_SET_ENCRYPTION_NONCE ioctl. Would such an interface be reasonable?
On Mon, May 17, 2021 at 04:25:15PM -0700, Omar Sandoval wrote: > > Okay, I think we're in agreement: RWF_ENCODED for the data and separate > ioctls for the encryption context. Since the fscrypt policy struct > includes all of the relevant information, RWF_ENCODED can probably just > have a single ENCODED_IOV_ENCRYPTION_FSCRYPT encryption type. > RWF_ENCODED can express data which is both compressed and encrypted, so > that should be fine as well. > > The only other missing piece that I see (other than filesystem support) > is an FS_IOC_SET_ENCRYPTION_NONCE ioctl. Would such an interface be > reasonable? In theory, it will be possible to add FS_IOC_SET_ENCRYPTION_NONCE. The implementation might be tricky. It would have to take the inode lock, verify that the file is empty, replace the encryption xattr, and re-derive and replace the file's encryption key. Replacing the key should be safe because the file is empty, but it's hard to be sure -- and what about directories? Another concern is that userspace could misuse this ioctl and somehow end up reusing nonces, which would be bad; probably this should be a CAP_SYS_ADMIN thing only. A larger question is whether the goal is to support users backing up and restoring encrypted files without their encryption key being available -- in which case things would become *much* harder. First because of the filenames encryption, and second because we currently don't allow opening files without their encryption key. - Eric
On Mon, May 17, 2021 at 04:25:15PM -0700, Omar Sandoval wrote: > > Well, assuming we're talking about regular files only (so file contents > > encryption, not filenames encryption), > > Yes, I was thinking of regular files. File operations using encrypted > names sounds... interesting, but I think out of scope for this. So the question I have is why would you want to get the raw encrypted data? One possible reason (and this is what Michael Halwcrow and I had tried designing years ago) was so you could backup a device that had multiple users' files without having all of the users' keys --- and then be able to restore them. So for example, suppose you had a tablet that is shared by multiple family members, and you want to be backup all of the data on the shared device so that it could be restored in case one of the kids drop the tablet in the swimming pool.... But in order to do that, you need to be able to restore the encrypted files in the encrypted directories. In practice, encrypted files generally exist in encrypted directories. That's because the typical way fscrypt gets used is we set a policy on an empty directory, and then all of the newly files created files have encrypted file names, inherit the directory's encryption policy, and then have encrypted file contents. So do you have the encryption key, or not? If you do have the encryption key, then you can ignore the issue of the file name when you open the file, but what's the reason why you would want to extract out the raw encrypted data plus the raw encryption metadata? You're not going to be able to restore the encrypted file, in the encrypted directory name. Perhaps it's because you want to keep the data encrypted while you're tranferring it --- but the filename needs to be encrypted as well, and given modern CPU's, with or without inline-crypto engines, the cost of decrypting the file data and then re-encrypting it in the backup key isn't really that large. If you don't have the encryption key, then you need to be able to open the file using using the encrypted name (which fscrypt does support) and then extract out the encrypted file name using another bundle of encryption metadata. So that's a bit more complicated, but it's doable. The *really* hard part is *restoring* an encrypted directory hierarchy. Michael and I did create a straw design proposal (which is too small to fit in the margins of this e-mail :-), but suffice it to say that the standard Posix system calls are not sufficient to be able to create encrypted files and encrypted directories, and it would have been messy as all hell. Which is why we breathed a sign of relief when the original product requirement of being able to do backup/restore of shared devices went away. :-) The thing is, though, just being able to extract out regular files in their raw encrypted on-disk form, along with their filename metadata, seems to be a bit of a party trick without a compelling use case that I can see. But perhaps you have something in mind? - Ted
On Mon, May 17, 2021 at 10:53:16PM -0400, Theodore Y. Ts'o wrote: > On Mon, May 17, 2021 at 04:25:15PM -0700, Omar Sandoval wrote: > > > Well, assuming we're talking about regular files only (so file contents > > > encryption, not filenames encryption), > > > > Yes, I was thinking of regular files. File operations using encrypted > > names sounds... interesting, but I think out of scope for this. > > So the question I have is why would you want to get the raw encrypted > data? One possible reason (and this is what Michael Halwcrow and I > had tried designing years ago) was so you could backup a device that > had multiple users' files without having all of the users' keys --- > and then be able to restore them. So for example, suppose you had a > tablet that is shared by multiple family members, and you want to be > backup all of the data on the shared device so that it could be > restored in case one of the kids drop the tablet in the swimming pool.... > > But in order to do that, you need to be able to restore the encrypted > files in the encrypted directories. In practice, encrypted files > generally exist in encrypted directories. That's because the typical > way fscrypt gets used is we set a policy on an empty directory, and > then all of the newly files created files have encrypted file names, > inherit the directory's encryption policy, and then have encrypted > file contents. > > So do you have the encryption key, or not? If you do have the > encryption key, then you can ignore the issue of the file name when > you open the file, but what's the reason why you would want to extract > out the raw encrypted data plus the raw encryption metadata? You're > not going to be able to restore the encrypted file, in the encrypted > directory name. Perhaps it's because you want to keep the data > encrypted while you're tranferring it --- but the filename needs to be > encrypted as well, and given modern CPU's, with or without > inline-crypto engines, the cost of decrypting the file data and then > re-encrypting it in the backup key isn't really that large. > > If you don't have the encryption key, then you need to be able to open > the file using using the encrypted name (which fscrypt does support) > and then extract out the encrypted file name using another bundle of > encryption metadata. So that's a bit more complicated, but it's > doable. > > The *really* hard part is *restoring* an encrypted directory > hierarchy. Michael and I did create a straw design proposal (which is > too small to fit in the margins of this e-mail :-), but suffice it to > say that the standard Posix system calls are not sufficient to be able > to create encrypted files and encrypted directories, and it would have > been messy as all hell. Which is why we breathed a sign of relief > when the original product requirement of being able to do > backup/restore of shared devices went away. :-) > > The thing is, though, just being able to extract out regular files in > their raw encrypted on-disk form, along with their filename metadata, > seems to be a bit of a party trick without a compelling use case that > I can see. But perhaps you have something in mind? Thanks for the detailed response, Ted. I personally don't have a use case for reading and writing encrypted data. I only care about skipping compression/decompression, but early on it was pointed out that this API could potentially also be used for encrypted data. The question at this point is: if someone else comes along and solves the problems with restoring encrypted filenames, is this interface enough for them to restore the encrypted file data? It seems like the answer is yes, with a couple of additions to fscrypt. I should've been clearer that I don't have concrete plans to do this, I just wanted to leave the door open for it so that we don't need a second, very similar interface. Thanks, Omar
On Tue, May 18, 2021 at 01:38:08AM -0700, Omar Sandoval wrote: > Thanks for the detailed response, Ted. I personally don't have a use > case for reading and writing encrypted data. I only care about skipping > compression/decompression, but early on it was pointed out that this API > could potentially also be used for encrypted data. The question at this > point is: if someone else comes along and solves the problems with > restoring encrypted filenames, is this interface enough for them to > restore the encrypted file data? It seems like the answer is yes, with a > couple of additions to fscrypt. I should've been clearer that I don't > have concrete plans to do this, I just wanted to leave the door open for > it so that we don't need a second, very similar interface. Well, practically speaking, we would need to have a way to extract out the encrypted file name information; and given that an encrypted file could have hard links, we need to be able to obtain the encrypted file name information for the dentry that was used to open that file. This arguably should be separate from the encryption information for data stream itself, so if we want to handwave how we fetch the encrypted filename info (maybe some magic ioctl, or maybe via using some kind of magic RWF flag used for reading encrypted directories that are opened via O_DIRECTORY, which sorta-works like readdir() but also returns some additional metadata information for each directory entry), sure it should be possible to use your proposed interface as a starting point. I'm not sure we want to try to design all of the details of how to get the encrypted data plus encryption metadata for the data stream, but in theory, so long as there is a way to get the encryption metadata, sure, it could work. One other approach is to just abuse the xattr interface, the way we do with Posix ACL's and Capabilities, where the on-disk format and the format used when we query a file's ACL via the xattr interface don't necessary have to be identical. I'm sure that will result in howls of outrage in some quarters, but this is something for which we have a precedent. Cheers, - Ted
On Mon, May 17, 2021 at 02:32:47PM -0700, Linus Torvalds wrote: > On Mon, May 17, 2021 at 11:35 AM Omar Sandoval <osandov@osandov.com> wrote: > > > > Patches 1-3 add the VFS support, UAPI, and documentation. Patches 4-7 > > are Btrfs prep patches. Patch 8 adds Btrfs encoded read support and > > patch 9 adds Btrfs encoded write support. > > I don't love the RWF_ENCODED flag, but if that's the way people think > this should be done, as a model this looks reasonable to me. > > I'm not sure what the deal with the encryption metadata is. I realize > there is currently only one encryption type ("none") in this series, > but it's not clear how any other encryption type would actually ever > be described. It's not like you can pass in the key (well, I guess > passing in the key would be fine, but passing it back out certainly > would not be). A key ID from a keyring? > > So there's presumably some future plan for it, but it would be good to > verify that that plan makes sense.. To summarize the discussion and answer your original question, using RWF_ENCODED for encryption will require additional support for getting encryption metadata, but that support is best suited for a separate interface, with RWF_ENCODED purely for the encrypted data itself. The harder part of encrypted backups is restoring filenames, and that would also be best as a separate interface. My use case is only for compression, so none of that is a blocker for RWF_ENCODED. What else can I do to move this along? Thanks, Omar
From: Omar Sandoval <osandov@fb.com> This series adds an API for reading compressed data on a filesystem without decompressing it as well as support for writing compressed data directly to the filesystem. I have test cases (including fsstress support) and example programs which I'll send up once the dust settles [1]. The main use-case is Btrfs send/receive: currently, when sending data from one compressed filesystem to another, the sending side decompresses the data and the receiving side recompresses it before writing it out. This is wasteful and can be avoided if we can just send and write compressed extents. The patches implementing the send/receive support were sent with the last submission of this series [2]. Patches 1-3 add the VFS support, UAPI, and documentation. Patches 4-7 are Btrfs prep patches. Patch 8 adds Btrfs encoded read support and patch 9 adds Btrfs encoded write support. These patches are based on Dave Sterba's Btrfs misc-next branch [3], which is in turn currently based on v5.13-rc2. This is a _resend of a resend_ of v9 [4], rebased on the latest kdave/misc-next branch. 1: https://github.com/osandov/xfstests/tree/rwf-encoded 2: https://lore.kernel.org/linux-btrfs/cover.1615922753.git.osandov@fb.com/ 3: https://github.com/kdave/btrfs-devel/tree/misc-next 4: https://lore.kernel.org/linux-fsdevel/cover.1619463858.git.osandov@fb.com/ Omar Sandoval (9): iov_iter: add copy_struct_from_iter() fs: add O_ALLOW_ENCODED open flag fs: add RWF_ENCODED for reading/writing compressed data btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() btrfs: add ram_bytes and offset to btrfs_ordered_extent btrfs: support different disk extent size for delalloc btrfs: optionally extend i_size in cow_file_range_inline() btrfs: implement RWF_ENCODED reads btrfs: implement RWF_ENCODED writes Documentation/filesystems/encoded_io.rst | 240 ++++++ Documentation/filesystems/index.rst | 1 + arch/alpha/include/uapi/asm/fcntl.h | 1 + arch/parisc/include/uapi/asm/fcntl.h | 1 + arch/sparc/include/uapi/asm/fcntl.h | 1 + fs/btrfs/compression.c | 12 +- fs/btrfs/compression.h | 6 +- fs/btrfs/ctree.h | 9 +- fs/btrfs/delalloc-space.c | 18 +- fs/btrfs/file-item.c | 35 +- fs/btrfs/file.c | 46 +- fs/btrfs/inode.c | 929 +++++++++++++++++++++-- fs/btrfs/ordered-data.c | 124 +-- fs/btrfs/ordered-data.h | 25 +- fs/btrfs/relocation.c | 4 +- fs/fcntl.c | 10 +- fs/namei.c | 4 + fs/read_write.c | 168 +++- include/linux/encoded_io.h | 17 + include/linux/fcntl.h | 2 +- include/linux/fs.h | 13 + include/linux/uio.h | 1 + include/uapi/asm-generic/fcntl.h | 4 + include/uapi/linux/encoded_io.h | 30 + include/uapi/linux/fs.h | 5 +- lib/iov_iter.c | 91 +++ 26 files changed, 1563 insertions(+), 234 deletions(-) create mode 100644 Documentation/filesystems/encoded_io.rst create mode 100644 include/linux/encoded_io.h create mode 100644 include/uapi/linux/encoded_io.h