Message ID | 20221213172935.680971-10-aalbersh@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs-verity support for XFS | expand |
On Tue, Dec 13, 2022 at 06:29:33PM +0100, Andrey Albershteyn wrote: > Add fs-verity page verification in read IO path. The verification > itself is offloaded into workqueue (provided by fs-verity). > > The work_struct items are allocated from bioset side by side with > bio being processed. > > As inodes with fs-verity doesn't use large folios we check only > first page of the folio for errors (set by fs-verity if verification > failed). > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com> > --- > fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++--- > include/linux/iomap.h | 5 +++ > 2 files changed, 81 insertions(+), 4 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 91ee0b308e13d..b7abc2f806cfc 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -17,6 +17,7 @@ > #include <linux/bio.h> > #include <linux/sched/signal.h> > #include <linux/migrate.h> > +#include <linux/fsverity.h> > #include "trace.h" > > #include "../internal.h" > @@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio) > } > > static struct bio_set iomap_ioend_bioset; > +static struct bio_set iomap_readend_bioset; > > static struct iomap_page * > iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags) > @@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio) > int error = blk_status_to_errno(bio->bi_status); > struct folio_iter fi; > > - bio_for_each_folio_all(fi, bio) > + bio_for_each_folio_all(fi, bio) { > + /* > + * As fs-verity doesn't work with multi-page folios, verity > + * inodes have large folios disabled (only single page folios > + * are used) > + */ > + if (!error) > + error = PageError(folio_page(fi.folio, 0)); > + > iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error); > + } fs/verity/ no longer uses PG_error to report errors. See upstream commit 98dc08bae678 ("fsverity: stop using PG_error to track error status"). - Eric
On Tue, Dec 13, 2022 at 06:29:33PM +0100, Andrey Albershteyn wrote: > Add fs-verity page verification in read IO path. The verification > itself is offloaded into workqueue (provided by fs-verity). > > The work_struct items are allocated from bioset side by side with > bio being processed. > > As inodes with fs-verity doesn't use large folios we check only > first page of the folio for errors (set by fs-verity if verification > failed). > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com> > --- > fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++--- > include/linux/iomap.h | 5 +++ > 2 files changed, 81 insertions(+), 4 deletions(-) > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 91ee0b308e13d..b7abc2f806cfc 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -17,6 +17,7 @@ > #include <linux/bio.h> > #include <linux/sched/signal.h> > #include <linux/migrate.h> > +#include <linux/fsverity.h> > #include "trace.h" > > #include "../internal.h" > @@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio) > } > > static struct bio_set iomap_ioend_bioset; > +static struct bio_set iomap_readend_bioset; > > static struct iomap_page * > iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags) > @@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio) > int error = blk_status_to_errno(bio->bi_status); > struct folio_iter fi; > > - bio_for_each_folio_all(fi, bio) > + bio_for_each_folio_all(fi, bio) { > + /* > + * As fs-verity doesn't work with multi-page folios, verity > + * inodes have large folios disabled (only single page folios > + * are used) > + */ > + if (!error) > + error = PageError(folio_page(fi.folio, 0)); > + > iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error); > + } > + > bio_put(bio); > + /* The iomap_readend has been freed by bio_put() */ > +} > + > +static void iomap_read_work_end_io( > + struct work_struct *work) > +{ > + struct iomap_readend *ctx = > + container_of(work, struct iomap_readend, read_work); > + struct bio *bio = &ctx->read_inline_bio; > + > + fsverity_verify_bio(bio); > + iomap_read_end_io(bio); > +} > + > +static void iomap_read_work_io(struct bio *bio) > +{ > + struct iomap_readend *ctx = > + container_of(bio, struct iomap_readend, read_inline_bio); > + > + fsverity_enqueue_verify_work(&ctx->read_work); > } Ok, so fsverity simple queues this to a global workqueue it has set up as WQ_UNBOUND | WQ_HIGHPRI with, effectively, one worker thread per CPU. This will work for simple cases and to get the basic infrastructure in place, but there are problems here that we will need to address. 1. High priority workqueues are used within XFS to ensure that data IO completion cannot stall processing of journal IO completions. We use them to prevent IO priority inversion deadlocks. That is, journal IO completions use a WQ_HIGHPRI workqueue to ensure that they are scheduled ahead of data IO completions as data IO completions may require journal IO to complete before they can make progress (e.g. to ensure transaction reservations in IO completion can make progress). Hence using a WQ_HIGHPRI workqueue directly in the user data IO path is a potential filesystem livelock/deadlock vector. At best, it's going to impact on the latency of journal commits and hence anything that is running data integrity operations whilst fsverity read operations are in progress. The second thing is that the fsverity workqueue is global - it creates a cross-filesystem contention point. That means a single busy fsverity filesystem will create unpredictable IO latency for filesystems that only use fsverity sparingly. i.e. heavy amounts of fsverity work on one filesystem will impact the performance of journal and other IO operations on all other active filesystems due to fsverity using WQ_HIGHPRI. This sort of workqueue setup is typically fine for a single user device that has lots of otherwise idle CPU that can be co-opted for create concurrency in IO completion work where there would otherwise been none (e.g. an Android phone). However, it is less than ideal for a high concurrent application using AIO or io_uring to push a couple of million IOPS through the filesystem. In these sitations, we don't want IO completion work to be spread outi because there is no idle CPU for them to be run on. Instead, locality is king - we want them run on the same CPU that already has the inode, bio, and other objects hot in the cache. So, really, for XFS we want per-filesystem, locality preserving per-cpu workqueues for fsverity work as we get IO concurrency (and therefore fsverity work concurrency) from the application IO patterns. And it avoids all potential issues with using WQ_HIGHPRI for journal IO to avoid data IO completion deadlocks. > struct iomap_readpage_ctx { > @@ -264,6 +296,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, > loff_t orig_pos = pos; > size_t poff, plen; > sector_t sector; > + struct iomap_readend *readend; > > if (iomap->type == IOMAP_INLINE) > return iomap_read_inline_data(iter, folio); > @@ -276,7 +309,21 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, > > if (iomap_block_needs_zeroing(iter, pos)) { > folio_zero_range(folio, poff, plen); > - iomap_set_range_uptodate(folio, iop, poff, plen); > + if (!fsverity_active(iter->inode)) { > + iomap_set_range_uptodate(folio, iop, poff, plen); > + goto done; > + } > + > + /* > + * As fs-verity doesn't work with folios sealed inodes have > + * multi-page folios disabled and we can check on first and only > + * page > + */ > + if (fsverity_verify_page(folio_page(folio, 0))) > + iomap_set_range_uptodate(folio, iop, poff, plen); > + else > + folio_set_error(folio); This makes me wonder if this should be a iomap->page_op method rather than calling fsverity code directly. i.e. if the filesystem knows that it fsverity is enabled on the inode during it's iomap_begin() callout, and that state *must not change* until the IO is complete. Hence it can set a iomap flag saying IOMAP_F_READ_VERIFY and add a page_ops vector with a "verify_folio" callout. Then this code can do: if (iomap_block_needs_zeroing(iter, pos)) { folio_zero_range(folio, poff, plen); if (iomap->flags & IOMAP_F_READ_VERIFY) { if (!iomap->page_ops->verify_folio(folio, poff, plen)) { folio_set_error(folio); goto done; } } iomap_set_range_uptodate(folio, iop, poff, plen); got done; } > @@ -297,8 +344,18 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, > > if (ctx->rac) /* same as readahead_gfp_mask */ > gfp |= __GFP_NORETRY | __GFP_NOWARN; > - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), > + if (fsverity_active(iter->inode)) { > + ctx->bio = bio_alloc_bioset(iomap->bdev, > + bio_max_segs(nr_vecs), REQ_OP_READ, > + GFP_NOFS, &iomap_readend_bioset); > + readend = container_of(ctx->bio, > + struct iomap_readend, > + read_inline_bio); > + INIT_WORK(&readend->read_work, iomap_read_work_end_io); > + } else { > + ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), > REQ_OP_READ, gfp); > + } > /* > * If the bio_alloc fails, try it again for a single page to > * avoid having to deal with partial page reads. This emulates > @@ -311,7 +368,11 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, > if (ctx->rac) > ctx->bio->bi_opf |= REQ_RAHEAD; > ctx->bio->bi_iter.bi_sector = sector; > - ctx->bio->bi_end_io = iomap_read_end_io; > + if (fsverity_active(iter->inode)) > + ctx->bio->bi_end_io = iomap_read_work_io; > + else > + ctx->bio->bi_end_io = iomap_read_end_io; Hmmm. OK, why not just always use the iomap_readend_bioset, put a flags field in it and then do this check in iomap_read_end_io()? i.e. struct iomap_read_ioend { bool queue_work; struct work_struct work; /* post read work (fs-verity) */ struct bio inline_bio;/* MUST BE LAST! */ }; Then always allocate from the iomap_read_ioend_bioset, and do: ctx->bio->bi_end_io = iomap_read_end_io; if (iomap->flags & IOMAP_F_VERITY) { read_ioend->queue_work = true; INIT_WORK(&read_ioend->work, iomap_read_ioend_work); } The in iomap_read_end_io(): if (ioend->queue_work) { fsverity_enqueue_verify_work(&ctx->work); return; } iomap_read_end_io(bio); And if we put a fs supplied workqueue into the iomap as well, then we have a mechanism for the filesystem to supply it's own workqueue for fsverity, fscrypt, or any other read-side post-IO data processing functions filesystems care to add. Cheers, Dave.
On Tue, Dec 13, 2022 at 11:02:59AM -0800, Eric Biggers wrote: > On Tue, Dec 13, 2022 at 06:29:33PM +0100, Andrey Albershteyn wrote: > > Add fs-verity page verification in read IO path. The verification > > itself is offloaded into workqueue (provided by fs-verity). > > > > The work_struct items are allocated from bioset side by side with > > bio being processed. > > > > As inodes with fs-verity doesn't use large folios we check only > > first page of the folio for errors (set by fs-verity if verification > > failed). > > > > Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com> > > --- > > fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++--- > > include/linux/iomap.h | 5 +++ > > 2 files changed, 81 insertions(+), 4 deletions(-) > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > > index 91ee0b308e13d..b7abc2f806cfc 100644 > > --- a/fs/iomap/buffered-io.c > > +++ b/fs/iomap/buffered-io.c > > @@ -17,6 +17,7 @@ > > #include <linux/bio.h> > > #include <linux/sched/signal.h> > > #include <linux/migrate.h> > > +#include <linux/fsverity.h> > > #include "trace.h" > > > > #include "../internal.h" > > @@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio) > > } > > > > static struct bio_set iomap_ioend_bioset; > > +static struct bio_set iomap_readend_bioset; > > > > static struct iomap_page * > > iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags) > > @@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio) > > int error = blk_status_to_errno(bio->bi_status); > > struct folio_iter fi; > > > > - bio_for_each_folio_all(fi, bio) > > + bio_for_each_folio_all(fi, bio) { > > + /* > > + * As fs-verity doesn't work with multi-page folios, verity > > + * inodes have large folios disabled (only single page folios > > + * are used) > > + */ > > + if (!error) > > + error = PageError(folio_page(fi.folio, 0)); > > + > > iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error); > > + } > > fs/verity/ no longer uses PG_error to report errors. See upstream commit > 98dc08bae678 ("fsverity: stop using PG_error to track error status"). Thanks! Missed that. > > - Eric >
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 91ee0b308e13d..b7abc2f806cfc 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -17,6 +17,7 @@ #include <linux/bio.h> #include <linux/sched/signal.h> #include <linux/migrate.h> +#include <linux/fsverity.h> #include "trace.h" #include "../internal.h" @@ -42,6 +43,7 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio) } static struct bio_set iomap_ioend_bioset; +static struct bio_set iomap_readend_bioset; static struct iomap_page * iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags) @@ -189,9 +191,39 @@ static void iomap_read_end_io(struct bio *bio) int error = blk_status_to_errno(bio->bi_status); struct folio_iter fi; - bio_for_each_folio_all(fi, bio) + bio_for_each_folio_all(fi, bio) { + /* + * As fs-verity doesn't work with multi-page folios, verity + * inodes have large folios disabled (only single page folios + * are used) + */ + if (!error) + error = PageError(folio_page(fi.folio, 0)); + iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error); + } + bio_put(bio); + /* The iomap_readend has been freed by bio_put() */ +} + +static void iomap_read_work_end_io( + struct work_struct *work) +{ + struct iomap_readend *ctx = + container_of(work, struct iomap_readend, read_work); + struct bio *bio = &ctx->read_inline_bio; + + fsverity_verify_bio(bio); + iomap_read_end_io(bio); +} + +static void iomap_read_work_io(struct bio *bio) +{ + struct iomap_readend *ctx = + container_of(bio, struct iomap_readend, read_inline_bio); + + fsverity_enqueue_verify_work(&ctx->read_work); } struct iomap_readpage_ctx { @@ -264,6 +296,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, loff_t orig_pos = pos; size_t poff, plen; sector_t sector; + struct iomap_readend *readend; if (iomap->type == IOMAP_INLINE) return iomap_read_inline_data(iter, folio); @@ -276,7 +309,21 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, if (iomap_block_needs_zeroing(iter, pos)) { folio_zero_range(folio, poff, plen); - iomap_set_range_uptodate(folio, iop, poff, plen); + if (!fsverity_active(iter->inode)) { + iomap_set_range_uptodate(folio, iop, poff, plen); + goto done; + } + + /* + * As fs-verity doesn't work with folios sealed inodes have + * multi-page folios disabled and we can check on first and only + * page + */ + if (fsverity_verify_page(folio_page(folio, 0))) + iomap_set_range_uptodate(folio, iop, poff, plen); + else + folio_set_error(folio); + goto done; } @@ -297,8 +344,18 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, if (ctx->rac) /* same as readahead_gfp_mask */ gfp |= __GFP_NORETRY | __GFP_NOWARN; - ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), + if (fsverity_active(iter->inode)) { + ctx->bio = bio_alloc_bioset(iomap->bdev, + bio_max_segs(nr_vecs), REQ_OP_READ, + GFP_NOFS, &iomap_readend_bioset); + readend = container_of(ctx->bio, + struct iomap_readend, + read_inline_bio); + INIT_WORK(&readend->read_work, iomap_read_work_end_io); + } else { + ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs), REQ_OP_READ, gfp); + } /* * If the bio_alloc fails, try it again for a single page to * avoid having to deal with partial page reads. This emulates @@ -311,7 +368,11 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, if (ctx->rac) ctx->bio->bi_opf |= REQ_RAHEAD; ctx->bio->bi_iter.bi_sector = sector; - ctx->bio->bi_end_io = iomap_read_end_io; + if (fsverity_active(iter->inode)) + ctx->bio->bi_end_io = iomap_read_work_io; + else + ctx->bio->bi_end_io = iomap_read_end_io; + bio_add_folio(ctx->bio, folio, plen, poff); } @@ -1546,6 +1607,17 @@ EXPORT_SYMBOL_GPL(iomap_writepages); static int __init iomap_init(void) { +#ifdef CONFIG_FS_VERITY + int error = 0; + + error = bioset_init(&iomap_readend_bioset, + 4 * (PAGE_SIZE / SECTOR_SIZE), + offsetof(struct iomap_readend, read_inline_bio), + BIOSET_NEED_BVECS); + if (error) + return error; +#endif + return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE), offsetof(struct iomap_ioend, io_inline_bio), BIOSET_NEED_BVECS); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 238a03087e17e..dbdef159b20d7 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -264,6 +264,11 @@ struct iomap_ioend { struct bio io_inline_bio; /* MUST BE LAST! */ }; +struct iomap_readend { + struct work_struct read_work; /* post read work (fs-verity) */ + struct bio read_inline_bio;/* MUST BE LAST! */ +}; + struct iomap_writeback_ops { /* * Required, maps the blocks so that writeback can be performed on
Add fs-verity page verification in read IO path. The verification itself is offloaded into workqueue (provided by fs-verity). The work_struct items are allocated from bioset side by side with bio being processed. As inodes with fs-verity doesn't use large folios we check only first page of the folio for errors (set by fs-verity if verification failed). Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com> --- fs/iomap/buffered-io.c | 80 +++++++++++++++++++++++++++++++++++++++--- include/linux/iomap.h | 5 +++ 2 files changed, 81 insertions(+), 4 deletions(-)