Message ID | 171175868064.1987804.7068231057141413548.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/13] fs: add FS_XFLAG_VERITY for verity files | expand |
On Fri, Mar 29, 2024 at 05:35:48PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Now that we've made the verity workqueue per-superblock, we don't need > the systemwide workqueue. Get rid of the old implementation. This commit message needs to be rephrased because this commit isn't just removing unused code. It's also converting ext4 and f2fs over to the new workqueue type. (Maybe these two parts belong as separate patches?) Also, if there are any changes in the workqueue flags that are being used for ext4 and f2fs, that needs to be documented. - Eric
On Thu, Apr 04, 2024 at 11:14:07PM -0400, Eric Biggers wrote: > On Fri, Mar 29, 2024 at 05:35:48PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Now that we've made the verity workqueue per-superblock, we don't need > > the systemwide workqueue. Get rid of the old implementation. > > This commit message needs to be rephrased because this commit isn't just > removing unused code. It's also converting ext4 and f2fs over to the new > workqueue type. (Maybe these two parts belong as separate patches?) Yes, will fix that. > Also, if there are any changes in the workqueue flags that are being used for > ext4 and f2fs, that needs to be documented. Hmm. The current codebase does this: fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", WQ_HIGHPRI, num_online_cpus()); Looking at commit f959325e6ac3 ("fsverity: Remove WQ_UNBOUND from fsverity read workqueue"), I guess you want a bound workqueue so that the CPU that handles the readahead ioend will also handle the verity validation? Why do you set max_active to num_online_cpus()? Is that because the verity hash is (probably?) being computed on the CPUs, and there's only so many of those to go around, so there's little point in making more? Or is it to handle systems with more than WQ_DFL_ACTIVE (~256) CPUs? Maybe there's a different reason? If you add more CPUs to the system later, does this now constrain the number of CPUs that can be participating in verity validation? Why not let the system try to process as many read ioends as are ready to be processed, rather than introducing a constraint here? As for WQ_HIGHPRI, I wish Dave or Andrey would chime in on why this isn't appropriate for XFS. I think they have a reason for this, but the most I can do is speculate that it's to avoid blocking other things in the system. In Andrey's V5 patch, XFS creates its own the workqueue like this: https://lore.kernel.org/linux-xfs/20240304191046.157464-10-aalbersh@redhat.com/ struct workqueue_struct *wq = alloc_workqueue( "pread/%s", (WQ_FREEZABLE | WQ_MEM_RECLAIM), 0, sb->s_id); I don't grok this either -- read ioend workqueues aren't usually involved in memory reclaim at all, and I can't see why you'd want to freeze the verity workqueue during suspend. Reads are allowed on frozen filesystems, so I don't see why verity would be any different. In the end, I think I might change this to: int __fsverity_init_verify_wq(struct super_block *sb, unsigned int wq_flags, int max_active) { wq = alloc_workqueue("fsverity/%s", wq_flags, max_active, sb->s_id); ... } So that XFS can have what I think makes sense (wq_flags == 0, max_active == 0), and ext4/f2fs can continue the same way they do now. --D > - Eric >
On Wed, Apr 24, 2024 at 11:05:20AM -0700, Darrick J. Wong wrote: > On Thu, Apr 04, 2024 at 11:14:07PM -0400, Eric Biggers wrote: > > On Fri, Mar 29, 2024 at 05:35:48PM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > Now that we've made the verity workqueue per-superblock, we don't need > > > the systemwide workqueue. Get rid of the old implementation. > > > > This commit message needs to be rephrased because this commit isn't just > > removing unused code. It's also converting ext4 and f2fs over to the new > > workqueue type. (Maybe these two parts belong as separate patches?) > > Yes, will fix that. > > > Also, if there are any changes in the workqueue flags that are being used for > > ext4 and f2fs, that needs to be documented. > > Hmm. The current codebase does this: > > fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", > WQ_HIGHPRI, > num_online_cpus()); > > Looking at commit f959325e6ac3 ("fsverity: Remove WQ_UNBOUND from > fsverity read workqueue"), I guess you want a bound workqueue so that > the CPU that handles the readahead ioend will also handle the verity > validation? That was the intent of that commit. Hashing is fast enough on modern CPUs that it seemed best to just do the work on the CPU that the interrupt comes in on, instead of taking the performance hit of migrating the work to a different CPU. The cost of CPU migration was measured to be very significant on arm64. However, new information has come in indicating that the switch to a bound workqueue is harmful in some cases. Specifically, on some systems all storage interrupts happen on the same CPU, and if the fsverity (or dm-verity -- this applies there too) work takes too long due to having to read a lot of Merkle tree blocks, too much work ends up queued up on that one CPU. So this is an area that's under active investigation. For example, some improvements to unbound workqueues were upstreamed in v6.6, and I'll be taking a look at those and checking whether switching back to an appropriately-configured unbound workqueue would help. > Why do you set max_active to num_online_cpus()? Is that because the > verity hash is (probably?) being computed on the CPUs, and there's only > so many of those to go around, so there's little point in making more? > Or is it to handle systems with more than WQ_DFL_ACTIVE (~256) CPUs? > Maybe there's a different reason? > > If you add more CPUs to the system later, does this now constrain the > number of CPUs that can be participating in verity validation? Why not > let the system try to process as many read ioends as are ready to be > processed, rather than introducing a constraint here? I'm afraid that not much thought has been put into the value of max_active. dm-crypt has long used an unbound workqueue with max_active=num_online_cpus(), and I think I had just copied it from there. Then commit f959325e6ac3 just didn't change it when removing WQ_UNBOUND. According to Documentation/core-api/workqueue.rst, max_active is a per-CPU attribute. So I doubt that num_online_cpus() makes sense regardless of bound or unbound. It probably should just use the default value of 256 (which is specified by passing max_active=0). > As for WQ_HIGHPRI, I wish Dave or Andrey would chime in on why this > isn't appropriate for XFS. I think they have a reason for this, but the > most I can do is speculate that it's to avoid blocking other things in > the system. I think it's been shown that neither option (WQ_HIGHPRI or !WQ_HIGHPRI) is best in all cases. Ideally, the work for each individual I/O request would get scheduled at a priority that is appropriate for that particular I/O. This same problem shows up for dm-crypt, dm-verity, etc. So if you have a reason to use !WQ_HIGHPRI for now, that seems reasonable, but really all these subsystems need to be much smarter about scheduling their work. - Eric
On 2024-04-24 11:05:20, Darrick J. Wong wrote: > On Thu, Apr 04, 2024 at 11:14:07PM -0400, Eric Biggers wrote: > > On Fri, Mar 29, 2024 at 05:35:48PM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > Now that we've made the verity workqueue per-superblock, we don't need > > > the systemwide workqueue. Get rid of the old implementation. > > > > This commit message needs to be rephrased because this commit isn't just > > removing unused code. It's also converting ext4 and f2fs over to the new > > workqueue type. (Maybe these two parts belong as separate patches?) > > Yes, will fix that. > > > Also, if there are any changes in the workqueue flags that are being used for > > ext4 and f2fs, that needs to be documented. > > Hmm. The current codebase does this: > > fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", > WQ_HIGHPRI, > num_online_cpus()); > > Looking at commit f959325e6ac3 ("fsverity: Remove WQ_UNBOUND from > fsverity read workqueue"), I guess you want a bound workqueue so that > the CPU that handles the readahead ioend will also handle the verity > validation? > > Why do you set max_active to num_online_cpus()? Is that because the > verity hash is (probably?) being computed on the CPUs, and there's only > so many of those to go around, so there's little point in making more? > Or is it to handle systems with more than WQ_DFL_ACTIVE (~256) CPUs? > Maybe there's a different reason? > > If you add more CPUs to the system later, does this now constrain the > number of CPUs that can be participating in verity validation? Why not > let the system try to process as many read ioends as are ready to be > processed, rather than introducing a constraint here? > > As for WQ_HIGHPRI, I wish Dave or Andrey would chime in on why this > isn't appropriate for XFS. I think they have a reason for this, but the > most I can do is speculate that it's to avoid blocking other things in > the system. The log uses WQ_HIGHPRI for journal IO completion log->l_ioend_workqueue, as far I understand some data IO completion could require a transaction which make a reservation which could lead to data IO waiting for journal IO. But if data IO completion will be scheduled first this could be a possible deadlock... I don't see a particular example, but also I'm not sure why to make fs-verity high priority in XFS. > In Andrey's V5 patch, XFS creates its own the workqueue like this: > https://lore.kernel.org/linux-xfs/20240304191046.157464-10-aalbersh@redhat.com/ > > struct workqueue_struct *wq = alloc_workqueue( > "pread/%s", (WQ_FREEZABLE | WQ_MEM_RECLAIM), 0, sb->s_id); > > I don't grok this either -- read ioend workqueues aren't usually > involved in memory reclaim at all, and I can't see why you'd want to > freeze the verity workqueue during suspend. Reads are allowed on frozen > filesystems, so I don't see why verity would be any different. Yeah maybe freezable can go away, initially I picked those flags as most of the other workqueues in xfs are in same configuration.
On Mon, Apr 29, 2024 at 12:15:55PM +0200, Andrey Albershteyn wrote: > On 2024-04-24 11:05:20, Darrick J. Wong wrote: > > On Thu, Apr 04, 2024 at 11:14:07PM -0400, Eric Biggers wrote: > > > On Fri, Mar 29, 2024 at 05:35:48PM -0700, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > Now that we've made the verity workqueue per-superblock, we don't need > > > > the systemwide workqueue. Get rid of the old implementation. > > > > > > This commit message needs to be rephrased because this commit isn't just > > > removing unused code. It's also converting ext4 and f2fs over to the new > > > workqueue type. (Maybe these two parts belong as separate patches?) > > > > Yes, will fix that. > > > > > Also, if there are any changes in the workqueue flags that are being used for > > > ext4 and f2fs, that needs to be documented. > > > > Hmm. The current codebase does this: > > > > fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", > > WQ_HIGHPRI, > > num_online_cpus()); > > > > Looking at commit f959325e6ac3 ("fsverity: Remove WQ_UNBOUND from > > fsverity read workqueue"), I guess you want a bound workqueue so that > > the CPU that handles the readahead ioend will also handle the verity > > validation? > > > > Why do you set max_active to num_online_cpus()? Is that because the > > verity hash is (probably?) being computed on the CPUs, and there's only > > so many of those to go around, so there's little point in making more? > > Or is it to handle systems with more than WQ_DFL_ACTIVE (~256) CPUs? > > Maybe there's a different reason? > > > > If you add more CPUs to the system later, does this now constrain the > > number of CPUs that can be participating in verity validation? Why not > > let the system try to process as many read ioends as are ready to be > > processed, rather than introducing a constraint here? > > > > As for WQ_HIGHPRI, I wish Dave or Andrey would chime in on why this > > isn't appropriate for XFS. I think they have a reason for this, but the > > most I can do is speculate that it's to avoid blocking other things in > > the system. > > The log uses WQ_HIGHPRI for journal IO completion > log->l_ioend_workqueue, as far I understand some data IO completion > could require a transaction which make a reservation which > could lead to data IO waiting for journal IO. But if data IO > completion will be scheduled first this could be a possible > deadlock... I don't see a particular example, but also I'm not sure > why to make fs-verity high priority in XFS. Ah, ok. I'll add a comment about that to my patch then. > > In Andrey's V5 patch, XFS creates its own the workqueue like this: > > https://lore.kernel.org/linux-xfs/20240304191046.157464-10-aalbersh@redhat.com/ > > > > struct workqueue_struct *wq = alloc_workqueue( > > "pread/%s", (WQ_FREEZABLE | WQ_MEM_RECLAIM), 0, sb->s_id); > > > > I don't grok this either -- read ioend workqueues aren't usually > > involved in memory reclaim at all, and I can't see why you'd want to > > freeze the verity workqueue during suspend. Reads are allowed on frozen > > filesystems, so I don't see why verity would be any different. > > Yeah maybe freezable can go away, initially I picked those flags as > most of the other workqueues in xfs are in same configuration. <nod> --D > -- > - Andrey > >
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 7e44ccaf348f2..c2da9a0009e26 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -28,6 +28,7 @@ #include <linux/btrfs.h> #include <linux/security.h> #include <linux/fs_parser.h> +#include <linux/fsverity.h> #include "messages.h" #include "delayed-inode.h" #include "ctree.h" @@ -924,6 +925,11 @@ static int btrfs_fill_super(struct super_block *sb, sb->s_export_op = &btrfs_export_ops; #ifdef CONFIG_FS_VERITY sb->s_vop = &btrfs_verityops; + err = fsverity_init_verify_wq(sb); + if (err) { + btrfs_err(fs_info, "fsverity_init_verify_wq failed"); + return err; + } #endif sb->s_xattr = btrfs_xattr_handlers; sb->s_time_gran = 1; diff --git a/fs/buffer.c b/fs/buffer.c index 4f73d23c2c469..a2cb50ecfb829 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -327,13 +327,15 @@ static void decrypt_bh(struct work_struct *work) err = fscrypt_decrypt_pagecache_blocks(bh->b_folio, bh->b_size, bh_offset(bh)); if (err == 0 && need_fsverity(bh)) { + struct inode *inode = bh->b_folio->mapping->host; + /* * We use different work queues for decryption and for verity * because verity may require reading metadata pages that need * decryption, and we shouldn't recurse to the same workqueue. */ INIT_WORK(&ctx->work, verify_bh); - fsverity_enqueue_verify_work(&ctx->work); + fsverity_enqueue_verify_work(inode->i_sb, &ctx->work); return; } end_buffer_async_read(bh, err == 0); @@ -362,7 +364,8 @@ static void end_buffer_async_read_io(struct buffer_head *bh, int uptodate) fscrypt_enqueue_decrypt_work(&ctx->work); } else { INIT_WORK(&ctx->work, verify_bh); - fsverity_enqueue_verify_work(&ctx->work); + fsverity_enqueue_verify_work(inode->i_sb, + &ctx->work); } return; } diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c index 21e8f0aebb3c6..e40566b0ddb65 100644 --- a/fs/ext4/readpage.c +++ b/fs/ext4/readpage.c @@ -61,6 +61,7 @@ enum bio_post_read_step { struct bio_post_read_ctx { struct bio *bio; + const struct inode *inode; struct work_struct work; unsigned int cur_step; unsigned int enabled_steps; @@ -132,7 +133,8 @@ static void bio_post_read_processing(struct bio_post_read_ctx *ctx) case STEP_VERITY: if (ctx->enabled_steps & (1 << STEP_VERITY)) { INIT_WORK(&ctx->work, verity_work); - fsverity_enqueue_verify_work(&ctx->work); + fsverity_enqueue_verify_work(ctx->inode->i_sb, + &ctx->work); return; } ctx->cur_step++; @@ -195,6 +197,7 @@ static void ext4_set_bio_post_read_ctx(struct bio *bio, mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS); ctx->bio = bio; + ctx->inode = inode; ctx->enabled_steps = post_read_steps; bio->bi_private = ctx; } diff --git a/fs/ext4/super.c b/fs/ext4/super.c index cfb8449c731f9..f7c834f4e2b1f 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5332,6 +5332,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) #endif #ifdef CONFIG_FS_VERITY sb->s_vop = &ext4_verityops; + err = fsverity_init_verify_wq(sb); + if (err) + goto failed_mount3a; #endif #ifdef CONFIG_QUOTA sb->dq_op = &ext4_quota_operations; diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 8892c82621414..efd0b0a3a2c37 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -1775,7 +1775,8 @@ void f2fs_decompress_end_io(struct decompress_io_ctx *dic, bool failed, * file, and these metadata pages may be compressed. */ INIT_WORK(&dic->verity_work, f2fs_verify_cluster); - fsverity_enqueue_verify_work(&dic->verity_work); + fsverity_enqueue_verify_work(dic->inode->i_sb, + &dic->verity_work); return; } diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index d9494b5fc7c18..994339216a06e 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -221,7 +221,7 @@ static void f2fs_verify_and_finish_bio(struct bio *bio, bool in_task) if (ctx && (ctx->enabled_steps & STEP_VERITY)) { INIT_WORK(&ctx->work, f2fs_verify_bio); - fsverity_enqueue_verify_work(&ctx->work); + fsverity_enqueue_verify_work(ctx->sbi->sb, &ctx->work); } else { f2fs_finish_read_bio(bio, in_task); } diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index a6867f26f1418..422527d9f6baf 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -4423,6 +4423,9 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) #endif #ifdef CONFIG_FS_VERITY sb->s_vop = &f2fs_verityops; + err = fsverity_init_verify_wq(sb); + if (err) + goto free_bio_info; #endif sb->s_xattr = f2fs_xattr_handlers; sb->s_export_op = &f2fs_export_ops; diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h index 195a92f203bba..24846d475502d 100644 --- a/fs/verity/fsverity_private.h +++ b/fs/verity/fsverity_private.h @@ -154,8 +154,6 @@ static inline void fsverity_init_signature(void) /* verify.c */ -void __init fsverity_init_workqueue(void); - static inline bool fsverity_caches_blocks(const struct inode *inode) { const struct fsverity_operations *vops = inode->i_sb->s_vop; diff --git a/fs/verity/init.c b/fs/verity/init.c index 3769d2dc9e3b4..4663696c6996c 100644 --- a/fs/verity/init.c +++ b/fs/verity/init.c @@ -66,7 +66,6 @@ static int __init fsverity_init(void) { fsverity_check_hash_algs(); fsverity_init_info_cache(); - fsverity_init_workqueue(); fsverity_init_sysctl(); fsverity_init_signature(); fsverity_init_bpf(); diff --git a/fs/verity/verify.c b/fs/verity/verify.c index 4acfd02b0e42d..99c6d31fbcfba 100644 --- a/fs/verity/verify.c +++ b/fs/verity/verify.c @@ -10,8 +10,6 @@ #include <crypto/hash.h> #include <linux/bio.h> -static struct workqueue_struct *fsverity_read_workqueue; - /* * Returns true if the hash block with index @hblock_idx in the tree has * already been verified. @@ -384,33 +382,18 @@ EXPORT_SYMBOL_GPL(__fsverity_init_verify_wq); /** * fsverity_enqueue_verify_work() - enqueue work on the fs-verity workqueue + * @sb: superblock for this filesystem * @work: the work to enqueue * * Enqueue verification work for asynchronous processing. */ -void fsverity_enqueue_verify_work(struct work_struct *work) +void fsverity_enqueue_verify_work(struct super_block *sb, + struct work_struct *work) { - queue_work(fsverity_read_workqueue, work); + queue_work(sb->s_verify_wq, work); } EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work); -void __init fsverity_init_workqueue(void) -{ - /* - * Use a high-priority workqueue to prioritize verification work, which - * blocks reads from completing, over regular application tasks. - * - * For performance reasons, don't use an unbound workqueue. Using an - * unbound workqueue for crypto operations causes excessive scheduler - * latency on ARM64. - */ - fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", - WQ_HIGHPRI, - num_online_cpus()); - if (!fsverity_read_workqueue) - panic("failed to allocate fsverity_read_queue"); -} - /** * fsverity_read_merkle_tree_block() - read Merkle tree block * @inode: inode to which this Merkle tree blocks belong diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h index dcf9d9cffcb9f..ed93ca06ade5f 100644 --- a/include/linux/fsverity.h +++ b/include/linux/fsverity.h @@ -302,7 +302,8 @@ int fsverity_ioctl_read_metadata(struct file *filp, const void __user *uarg); bool fsverity_verify_blocks(struct folio *folio, size_t len, size_t offset); void fsverity_verify_bio(struct bio *bio); -void fsverity_enqueue_verify_work(struct work_struct *work); +void fsverity_enqueue_verify_work(struct super_block *sb, + struct work_struct *work); int __fsverity_init_verify_wq(struct super_block *sb); static inline int fsverity_init_verify_wq(struct super_block *sb) @@ -384,7 +385,8 @@ static inline void fsverity_verify_bio(struct bio *bio) WARN_ON_ONCE(1); } -static inline void fsverity_enqueue_verify_work(struct work_struct *work) +static inline void fsverity_enqueue_verify_work(struct super_block *sb, + struct work_struct *work) { WARN_ON_ONCE(1); }