Message ID | 20221223203638.41293-11-ebiggers@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | fsverity: support for non-4K pages | expand |
On Fri, 23 Dec 2022 12:36:37 -0800 Eric Biggers <ebiggers@kernel.org> wrote: > After each filesystem block (as represented by a buffer_head) has been > read from disk by block_read_full_folio(), verify it if needed. The > verification is done on the fsverity_read_workqueue. Also allow reads > of verity metadata past i_size, as required by ext4. Sigh. Do we reeeeealy need to mess with buffer.c in this fashion? Did any other subsystems feel a need to do this? > This is needed to support fsverity on ext4 filesystems where the > filesystem block size is less than the page size. Does any real person actually do this?
On Mon, Jan 09, 2023 at 06:37:59PM -0800, Andrew Morton wrote: > On Fri, 23 Dec 2022 12:36:37 -0800 Eric Biggers <ebiggers@kernel.org> wrote: > > > After each filesystem block (as represented by a buffer_head) has been > > read from disk by block_read_full_folio(), verify it if needed. The > > verification is done on the fsverity_read_workqueue. Also allow reads > > of verity metadata past i_size, as required by ext4. > > Sigh. Do we reeeeealy need to mess with buffer.c in this fashion? Did > any other subsystems feel a need to do this? ext4 is currently the only filesystem that uses block_read_full_folio() and that supports fsverity. However, since fsverity has a common infrastructure across filesystems, in fs/verity/, it makes sense to support it in the other filesystem infrastructure so that things aren't mutually exclusive for no reason. Note that this applies to fscrypt too, which block_read_full_folio() (previously block_read_full_page()) already supports since v5.5. If you'd prefer that block_read_full_folio() be copied into ext4, then modified to support fscrypt and fsverity, and then the fscrypt support removed from the original copy, we could do that. That seems more like a workaround to avoid modifying certain files than an actually better solution, but it could be done. > > > This is needed to support fsverity on ext4 filesystems where the > > filesystem block size is less than the page size. > > Does any real person actually do this? Yes, on systems with the page size larger than 4K, the ext4 filesystem block size is often smaller than the page size. ext4 encryption (fscrypt) originally had the same limitation, and Chandan Rajendra from IBM did significant work to solve it a few years ago, with the changes landing in v5.5. - Eric
On Mon, Jan 09, 2023 at 07:05:07PM -0800, Eric Biggers wrote: > On Mon, Jan 09, 2023 at 06:37:59PM -0800, Andrew Morton wrote: > > On Fri, 23 Dec 2022 12:36:37 -0800 Eric Biggers <ebiggers@kernel.org> wrote: > > > > > After each filesystem block (as represented by a buffer_head) has been > > > read from disk by block_read_full_folio(), verify it if needed. The > > > verification is done on the fsverity_read_workqueue. Also allow reads > > > of verity metadata past i_size, as required by ext4. > > > > Sigh. Do we reeeeealy need to mess with buffer.c in this fashion? Did > > any other subsystems feel a need to do this? > > ext4 is currently the only filesystem that uses block_read_full_folio() and that > supports fsverity. However, since fsverity has a common infrastructure across > filesystems, in fs/verity/, it makes sense to support it in the other filesystem > infrastructure so that things aren't mutually exclusive for no reason. > > Note that this applies to fscrypt too, which block_read_full_folio() (previously > block_read_full_page()) already supports since v5.5. > > If you'd prefer that block_read_full_folio() be copied into ext4, then modified > to support fscrypt and fsverity, and then the fscrypt support removed from the > original copy, we could do that. That seems more like a workaround to avoid > modifying certain files than an actually better solution, but it could be done. > > > > > > This is needed to support fsverity on ext4 filesystems where the > > > filesystem block size is less than the page size. > > > > Does any real person actually do this? > > Yes, on systems with the page size larger than 4K, the ext4 filesystem block > size is often smaller than the page size. ext4 encryption (fscrypt) originally > had the same limitation, and Chandan Rajendra from IBM did significant work to > solve it a few years ago, with the changes landing in v5.5. > > - Eric Any more thoughts on this from Andrew, the ext4 maintainers, or anyone else? - Eric
On Fri, Jan 20, 2023 at 11:56:45AM -0800, Eric Biggers wrote:
> Any more thoughts on this from Andrew, the ext4 maintainers, or anyone else?
As someone else: I relaly much prefer to support common functionality
(fsverity) in common helpers rather than copy and pasting them into
various file systems. The copy common helper and slightly modify it
is a cancer infecting various file systems that makes it really hard
to maintain the kernel.
diff --git a/fs/buffer.c b/fs/buffer.c index d9c6d1fbb6dde..2e65ba2b3919b 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -48,6 +48,7 @@ #include <linux/sched/mm.h> #include <trace/events/block.h> #include <linux/fscrypt.h> +#include <linux/fsverity.h> #include "internal.h" @@ -295,20 +296,52 @@ static void end_buffer_async_read(struct buffer_head *bh, int uptodate) return; } -struct decrypt_bh_ctx { +struct postprocess_bh_ctx { struct work_struct work; struct buffer_head *bh; }; +static void verify_bh(struct work_struct *work) +{ + struct postprocess_bh_ctx *ctx = + container_of(work, struct postprocess_bh_ctx, work); + struct buffer_head *bh = ctx->bh; + bool valid; + + valid = fsverity_verify_blocks(bh->b_page, bh->b_size, bh_offset(bh)); + end_buffer_async_read(bh, valid); + kfree(ctx); +} + +static bool need_fsverity(struct buffer_head *bh) +{ + struct page *page = bh->b_page; + struct inode *inode = page->mapping->host; + + return fsverity_active(inode) && + /* needed by ext4 */ + page->index < DIV_ROUND_UP(inode->i_size, PAGE_SIZE); +} + static void decrypt_bh(struct work_struct *work) { - struct decrypt_bh_ctx *ctx = - container_of(work, struct decrypt_bh_ctx, work); + struct postprocess_bh_ctx *ctx = + container_of(work, struct postprocess_bh_ctx, work); struct buffer_head *bh = ctx->bh; int err; err = fscrypt_decrypt_pagecache_blocks(bh->b_page, bh->b_size, bh_offset(bh)); + if (err == 0 && need_fsverity(bh)) { + /* + * We use different work queues for decryption and for verity + * because verity may require reading metadata pages that need + * decryption, and we shouldn't recurse to the same workqueue. + */ + INIT_WORK(&ctx->work, verify_bh); + fsverity_enqueue_verify_work(&ctx->work); + return; + } end_buffer_async_read(bh, err == 0); kfree(ctx); } @@ -319,15 +352,24 @@ static void decrypt_bh(struct work_struct *work) */ static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate) { - /* Decrypt if needed */ - if (uptodate && - fscrypt_inode_uses_fs_layer_crypto(bh->b_page->mapping->host)) { - struct decrypt_bh_ctx *ctx = kmalloc(sizeof(*ctx), GFP_ATOMIC); + struct inode *inode = bh->b_page->mapping->host; + bool decrypt = fscrypt_inode_uses_fs_layer_crypto(inode); + bool verify = need_fsverity(bh); + + /* Decrypt (with fscrypt) and/or verify (with fsverity) if needed. */ + if (uptodate && (decrypt || verify)) { + struct postprocess_bh_ctx *ctx = + kmalloc(sizeof(*ctx), GFP_ATOMIC); if (ctx) { - INIT_WORK(&ctx->work, decrypt_bh); ctx->bh = bh; - fscrypt_enqueue_decrypt_work(&ctx->work); + if (decrypt) { + INIT_WORK(&ctx->work, decrypt_bh); + fscrypt_enqueue_decrypt_work(&ctx->work); + } else { + INIT_WORK(&ctx->work, verify_bh); + fsverity_enqueue_verify_work(&ctx->work); + } return; } uptodate = 0; @@ -2245,6 +2287,11 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) int nr, i; int fully_mapped = 1; bool page_error = false; + loff_t limit = i_size_read(inode); + + /* This is needed for ext4. */ + if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode)) + limit = inode->i_sb->s_maxbytes; VM_BUG_ON_FOLIO(folio_test_large(folio), folio); @@ -2253,7 +2300,7 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) bbits = block_size_bits(blocksize); iblock = (sector_t)folio->index << (PAGE_SHIFT - bbits); - lblock = (i_size_read(inode)+blocksize-1) >> bbits; + lblock = (limit+blocksize-1) >> bbits; bh = head; nr = 0; i = 0;