Message ID | 20231006184922.252188-8-aalbersh@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs-verity support for XFS | expand |
On Fri, Oct 06, 2023 at 08:49:01PM +0200, Andrey Albershteyn wrote: > The bitmap is used to track verified status of the Merkle tree > blocks which are smaller than a PAGE. Blocks which fits exactly in a > page - use PageChecked() for tracking "verified" status. > > This patch switches to always use bitmap to track verified status. > This is needed to move fs-verity away from page management and work > only with Merkle tree blocks. How complicated would it be to keep supporting using the page bit when merkle_tree_block_size == page_size and the filesystem supports it? It's an efficient solution, so it would be a shame to lose it. Also it doesn't have the max file size limit that the bitmap has. > Also, this patch removes spinlock. The lock was used to reset bits > in bitmap belonging to one page. This patch works only with one > Merkle tree block and won't reset other blocks status. The spinlock is needed when there are multiple Merkle tree blocks per page and the filesystem is using the page-based caching. So I don't think you can remove it. Can you elaborate on why you feel it can be removed? > /* > - * Returns true if the hash block with index @hblock_idx in the tree, located in > - * @hpage, has already been verified. > + * Returns true if the hash block with index @hblock_idx in the tree has > + * already been verified. > */ > -static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage, > - unsigned long hblock_idx) > +static bool is_hash_block_verified(struct fsverity_info *vi, > + unsigned long hblock_idx, > + bool block_cached) > { > - bool verified; > - unsigned int blocks_per_page; > - unsigned int i; > - > - /* > - * When the Merkle tree block size and page size are the same, then the > - * ->hash_block_verified bitmap isn't allocated, and we use PG_checked > - * to directly indicate whether the page's block has been verified. > - * > - * Using PG_checked also guarantees that we re-verify hash pages that > - * get evicted and re-instantiated from the backing storage, as new > - * pages always start out with PG_checked cleared. > - */ > - if (!vi->hash_block_verified) > - return PageChecked(hpage); > - > - /* > - * When the Merkle tree block size and page size differ, we use a bitmap > - * to indicate whether each hash block has been verified. > - * > - * However, we still need to ensure that hash pages that get evicted and > - * re-instantiated from the backing storage are re-verified. To do > - * this, we use PG_checked again, but now it doesn't really mean > - * "checked". Instead, now it just serves as an indicator for whether > - * the hash page is newly instantiated or not. > - * > - * The first thread that sees PG_checked=0 must clear the corresponding > - * bitmap bits, then set PG_checked=1. This requires a spinlock. To > - * avoid having to take this spinlock in the common case of > - * PG_checked=1, we start with an opportunistic lockless read. > - */ Note that the above comment explains why the spinlock is needed. > - if (PageChecked(hpage)) { > - /* > - * A read memory barrier is needed here to give ACQUIRE > - * semantics to the above PageChecked() test. > - */ > - smp_rmb(); > + if (block_cached) > return test_bit(hblock_idx, vi->hash_block_verified); It's not clear what block_cached is supposed to mean here. - Eric
On 2023-10-10 20:15:43, Eric Biggers wrote: > On Fri, Oct 06, 2023 at 08:49:01PM +0200, Andrey Albershteyn wrote: > > The bitmap is used to track verified status of the Merkle tree > > blocks which are smaller than a PAGE. Blocks which fits exactly in a > > page - use PageChecked() for tracking "verified" status. > > > > This patch switches to always use bitmap to track verified status. > > This is needed to move fs-verity away from page management and work > > only with Merkle tree blocks. > > How complicated would it be to keep supporting using the page bit when > merkle_tree_block_size == page_size and the filesystem supports it? It's an > efficient solution, so it would be a shame to lose it. Also it doesn't have the > max file size limit that the bitmap has. Well, I think it's possible but my motivation was to step away from page manipulation as much as possible with intent to not affect other filesystems too much. I can probably add handling of this case to fsverity_read_merkle_tree_block() but fs/verity still will create bitmap and have a limit. The other way is basically revert changes done in patch 09, then, it probably will be quite a mix of page/block handling in fs/verity/verify.c > > Also, this patch removes spinlock. The lock was used to reset bits > > in bitmap belonging to one page. This patch works only with one > > Merkle tree block and won't reset other blocks status. > > The spinlock is needed when there are multiple Merkle tree blocks per page and > the filesystem is using the page-based caching. So I don't think you can remove > it. Can you elaborate on why you feel it can be removed? With this patch is_hash_block_verified() doesn't reset bits for blocks belonging to the same page. Even if page is re-instantiated only one block is checked in this case. So, when other blocks are checked they are reset. if (block_cached) return test_bit(hblock_idx, vi->hash_block_verified); > > - if (PageChecked(hpage)) { > > - /* > > - * A read memory barrier is needed here to give ACQUIRE > > - * semantics to the above PageChecked() test. > > - */ > > - smp_rmb(); > > + if (block_cached) > > return test_bit(hblock_idx, vi->hash_block_verified); > > It's not clear what block_cached is supposed to mean here. If block was re-instantiated or was in cache. I can try to come up with better name.
On Wed, Oct 11, 2023 at 03:03:55PM +0200, Andrey Albershteyn wrote: > > How complicated would it be to keep supporting using the page bit when > > merkle_tree_block_size == page_size and the filesystem supports it? It's an > > efficient solution, so it would be a shame to lose it. Also it doesn't have the > > max file size limit that the bitmap has. > > Well, I think it's possible but my motivation was to step away from > page manipulation as much as possible with intent to not affect other > filesystems too much. I can probably add handling of this case to > fsverity_read_merkle_tree_block() but fs/verity still will create > bitmap and have a limit. The other way is basically revert changes > done in patch 09, then, it probably will be quite a mix of page/block > handling in fs/verity/verify.c The page-based caching still has to be supported anyway, since that's what the other filesystems that support fsverity use, and it seems you don't plan to change that. The question is where the "block verified" flags should be stored. Currently there are two options: PG_checked and the separate bitmap. I'm not yet convinced that removing the support for the PG_checked method is a good change. PG_checked is a nice solution for the cases where it can be used; it requires no extra memory, no locking, and has no max file size. Also, this change seems mostly orthogonal to what you're actually trying to accomplish. > > > Also, this patch removes spinlock. The lock was used to reset bits > > > in bitmap belonging to one page. This patch works only with one > > > Merkle tree block and won't reset other blocks status. > > > > The spinlock is needed when there are multiple Merkle tree blocks per page and > > the filesystem is using the page-based caching. So I don't think you can remove > > it. Can you elaborate on why you feel it can be removed? > > With this patch is_hash_block_verified() doesn't reset bits for > blocks belonging to the same page. Even if page is re-instantiated > only one block is checked in this case. So, when other blocks are > checked they are reset. > > if (block_cached) > return test_bit(hblock_idx, vi->hash_block_verified); When part of the Merkle tree cache is evicted and re-instantiated, whether that part is a "page" or something else, the verified status of all the blocks contained in that part need to be invalidated so that they get re-verified. The existing code does that. I don't see how your proposed code does that. - Eric
Hi Eric, [Please excuse my ignorance, this is only the third time I've dived into fsverity.] On Thu, Oct 12, 2023 at 12:27:46AM -0700, Eric Biggers wrote: > On Wed, Oct 11, 2023 at 03:03:55PM +0200, Andrey Albershteyn wrote: > > > How complicated would it be to keep supporting using the page bit when > > > merkle_tree_block_size == page_size and the filesystem supports it? It's an > > > efficient solution, so it would be a shame to lose it. Also it doesn't have the > > > max file size limit that the bitmap has. How complex would it be to get rid of the bitmap entirely, and validate all the verity tree blocks within a page all at once instead of individual blocks within a page? Assuming willy isn't grinding his axe to get rid of PGchecked, obviously. ;) > > Well, I think it's possible but my motivation was to step away from > > page manipulation as much as possible with intent to not affect other > > filesystems too much. I can probably add handling of this case to > > fsverity_read_merkle_tree_block() but fs/verity still will create > > bitmap and have a limit. The other way is basically revert changes > > done in patch 09, then, it probably will be quite a mix of page/block > > handling in fs/verity/verify.c > > The page-based caching still has to be supported anyway, since that's what the > other filesystems that support fsverity use, and it seems you don't plan to > change that. I frankly have been asking myself why /this/ patchset adds so much extra code and flags and whatnot to XFS and fs/verity. From what I can tell, the xfs buffer cache has been extended to allocate double the memory so that xattr contents can be shadowed. getxattr for merkle tree contents then pins the buffer, shadows the contents, and hands both back to the caller (aka xfs_read_merkle_tree_block). The shadow memory is then handed to fs/verity to do its magic; following that, fsverity releases the reference and we can eventually drop the xfs_buf reference. But this seems way overcomplicated to me. ->read_merkle_tree_page hands us a pgoff_t and a suggestion for page readahead, and wants us to return an uptodate locked page, right? Why can't xfs allocate a page, walk the requested range to fill the page with merkle tree blocks that were written to the xattr structure (or zero the page contents if there is no xattr), and hand that page to fsverity? (It helps to provide the merkle tree block size to xfs_read_merkle_tree_page, thanks for adding that). Assuming fsverity also wants some caching, we could augment the xfs_inode to point to a separate address_space for cached merkle tree pages, and then xfs_read_merkle_tree_page can use __filemap_get_folio to find uptodate cached pages, or instantiate one and make it uptodate. Even better, we can pretty easily use multipage folios for this, though AFAICT the fs/verity code isn't yet up to handling that. The only thing I can't quite figure out is how to get memory reclaim to scan the extra address_space when it wants to try to reclaim pages. That part ext4 and f2fs got for free because they stuffed the merkle tree in the posteof space. But wouldn't that solve /all/ the plumbing problems without scattering bits of new code and flags into the xfs buffer cache, the extended attributes code, and elsewhere? And then xfs would not need to burn up vmap space to allocate 8K memory blocks just to provide 4k merkel tree blocks to fs/verity. That's just my 2 cents from spending a couple of hours hypothesizing how I would fill out the fsverity_operations. > The question is where the "block verified" flags should be stored. > Currently there are two options: PG_checked and the separate bitmap. I'm not > yet convinced that removing the support for the PG_checked method is a good > change. PG_checked is a nice solution for the cases where it can be used; it > requires no extra memory, no locking, and has no max file size. Also, this > change seems mostly orthogonal to what you're actually trying to accomplish. That scheme sounds way better to me than the bitmap. ;) --D > > > > Also, this patch removes spinlock. The lock was used to reset bits > > > > in bitmap belonging to one page. This patch works only with one > > > > Merkle tree block and won't reset other blocks status. > > > > > > The spinlock is needed when there are multiple Merkle tree blocks per page and > > > the filesystem is using the page-based caching. So I don't think you can remove > > > it. Can you elaborate on why you feel it can be removed? > > > > With this patch is_hash_block_verified() doesn't reset bits for > > blocks belonging to the same page. Even if page is re-instantiated > > only one block is checked in this case. So, when other blocks are > > checked they are reset. > > > > if (block_cached) > > return test_bit(hblock_idx, vi->hash_block_verified); > > When part of the Merkle tree cache is evicted and re-instantiated, whether that > part is a "page" or something else, the verified status of all the blocks > contained in that part need to be invalidated so that they get re-verified. The > existing code does that. I don't see how your proposed code does that. > > - Eric
On 2023-10-12 00:27:46, Eric Biggers wrote: > On Wed, Oct 11, 2023 at 03:03:55PM +0200, Andrey Albershteyn wrote: > > > How complicated would it be to keep supporting using the page bit when > > > merkle_tree_block_size == page_size and the filesystem supports it? It's an > > > efficient solution, so it would be a shame to lose it. Also it doesn't have the > > > max file size limit that the bitmap has. > > > > Well, I think it's possible but my motivation was to step away from > > page manipulation as much as possible with intent to not affect other > > filesystems too much. I can probably add handling of this case to > > fsverity_read_merkle_tree_block() but fs/verity still will create > > bitmap and have a limit. The other way is basically revert changes > > done in patch 09, then, it probably will be quite a mix of page/block > > handling in fs/verity/verify.c > > The page-based caching still has to be supported anyway, since that's what the > other filesystems that support fsverity use, and it seems you don't plan to > change that. The question is where the "block verified" flags should be stored. > Currently there are two options: PG_checked and the separate bitmap. I'm not > yet convinced that removing the support for the PG_checked method is a good > change. PG_checked is a nice solution for the cases where it can be used; it > requires no extra memory, no locking, and has no max file size. Also, this > change seems mostly orthogonal to what you're actually trying to accomplish. Yes, I was trying to combine PG_checked and bitmap in the way it can be also used by XFS. I will try change this patch to not drop merkle_tree_block_size == page_size case. > > > > Also, this patch removes spinlock. The lock was used to reset bits > > > > in bitmap belonging to one page. This patch works only with one > > > > Merkle tree block and won't reset other blocks status. > > > > > > The spinlock is needed when there are multiple Merkle tree blocks per page and > > > the filesystem is using the page-based caching. So I don't think you can remove > > > it. Can you elaborate on why you feel it can be removed? > > > > With this patch is_hash_block_verified() doesn't reset bits for > > blocks belonging to the same page. Even if page is re-instantiated > > only one block is checked in this case. So, when other blocks are > > checked they are reset. > > > > if (block_cached) > > return test_bit(hblock_idx, vi->hash_block_verified); > > When part of the Merkle tree cache is evicted and re-instantiated, whether that > part is a "page" or something else, the verified status of all the blocks > contained in that part need to be invalidated so that they get re-verified. The > existing code does that. I don't see how your proposed code does that. Oh, I see the problem now, I will revert it back.
On Thu, Oct 12, 2023 at 08:12:09PM -0700, Darrick J. Wong wrote: > Hi Eric, > > [Please excuse my ignorance, this is only the third time I've dived > into fsverity.] > > On Thu, Oct 12, 2023 at 12:27:46AM -0700, Eric Biggers wrote: > > On Wed, Oct 11, 2023 at 03:03:55PM +0200, Andrey Albershteyn wrote: > > > > How complicated would it be to keep supporting using the page bit when > > > > merkle_tree_block_size == page_size and the filesystem supports it? It's an > > > > efficient solution, so it would be a shame to lose it. Also it doesn't have the > > > > max file size limit that the bitmap has. > > How complex would it be to get rid of the bitmap entirely, and validate > all the verity tree blocks within a page all at once instead of > individual blocks within a page? > > Assuming willy isn't grinding his axe to get rid of PGchecked, > obviously. ;) See what I wrote earlier at https://lore.kernel.org/linux-xfs/Y5ltzp6yeMo1oDSk@sol.localdomain. Basically it would increase the worst-case latency by a lot. > > > > Well, I think it's possible but my motivation was to step away from > > > page manipulation as much as possible with intent to not affect other > > > filesystems too much. I can probably add handling of this case to > > > fsverity_read_merkle_tree_block() but fs/verity still will create > > > bitmap and have a limit. The other way is basically revert changes > > > done in patch 09, then, it probably will be quite a mix of page/block > > > handling in fs/verity/verify.c > > > > The page-based caching still has to be supported anyway, since that's what the > > other filesystems that support fsverity use, and it seems you don't plan to > > change that. > > I frankly have been asking myself why /this/ patchset adds so much extra > code and flags and whatnot to XFS and fs/verity. From what I can tell, > the xfs buffer cache has been extended to allocate double the memory so > that xattr contents can be shadowed. getxattr for merkle tree contents > then pins the buffer, shadows the contents, and hands both back to the > caller (aka xfs_read_merkle_tree_block). The shadow memory is then > handed to fs/verity to do its magic; following that, fsverity releases > the reference and we can eventually drop the xfs_buf reference. > > But this seems way overcomplicated to me. ->read_merkle_tree_page hands > us a pgoff_t and a suggestion for page readahead, and wants us to return > an uptodate locked page, right? > > Why can't xfs allocate a page, walk the requested range to fill the page > with merkle tree blocks that were written to the xattr structure (or > zero the page contents if there is no xattr), and hand that page to > fsverity? (It helps to provide the merkle tree block size to > xfs_read_merkle_tree_page, thanks for adding that). Earlier versions of this patchset did that. But, it's only really feasible if the pages are actually cached. Otherwise it's very inefficient and can result in random ENOMEM. > Assuming fsverity also wants some caching, we could augment the > xfs_inode to point to a separate address_space for cached merkle tree > pages, and then xfs_read_merkle_tree_page can use __filemap_get_folio to > find uptodate cached pages, or instantiate one and make it uptodate. > Even better, we can pretty easily use multipage folios for this, though > AFAICT the fs/verity code isn't yet up to handling that. > > The only thing I can't quite figure out is how to get memory reclaim to > scan the extra address_space when it wants to try to reclaim pages. > That part ext4 and f2fs got for free because they stuffed the merkle > tree in the posteof space. > > But wouldn't that solve /all/ the plumbing problems without scattering > bits of new code and flags into the xfs buffer cache, the extended > attributes code, and elsewhere? And then xfs would not need to burn up > vmap space to allocate 8K memory blocks just to provide 4k merkel tree > blocks to fs/verity. > > That's just my 2 cents from spending a couple of hours hypothesizing how > I would fill out the fsverity_operations. That might work. I'm not sure about the details though, e.g. can mapping->host point to the file's inode or would it need to be a fake one. - Eric
On Thu, Oct 12, 2023 at 12:27:46AM -0700, Eric Biggers wrote: > Currently there are two options: PG_checked and the separate bitmap. I'm not > yet convinced that removing the support for the PG_checked method is a good > change. PG_checked is a nice solution for the cases where it can be used; it > requires no extra memory, no locking, and has no max file size. Also, this > change seems mostly orthogonal to what you're actually trying to accomplish. Given that willy has been on a (IMHO reasonable) quest to kill off as many as possible page flags I'd really like to seize the opportunity and kill PageCheck in fsverity. How big are the downsides of the bitmap vs using the page flags, and do they matter in practice?
On Thu, Oct 12, 2023 at 08:12:09PM -0700, Darrick J. Wong wrote: > I frankly have been asking myself why /this/ patchset adds so much extra > code and flags and whatnot to XFS and fs/verity. From what I can tell, > the xfs buffer cache has been extended to allocate double the memory so > that xattr contents can be shadowed. getxattr for merkle tree contents > then pins the buffer, shadows the contents, and hands both back to the > caller (aka xfs_read_merkle_tree_block). The shadow memory is then > handed to fs/verity to do its magic; following that, fsverity releases > the reference and we can eventually drop the xfs_buf reference. > > But this seems way overcomplicated to me. ->read_merkle_tree_page hands > us a pgoff_t and a suggestion for page readahead, and wants us to return > an uptodate locked page, right? It does. That beeing said I really much prefer the block based interface from Andrey. It is a lot cleaner and without weird page cache internals, although it can still be implemented very nicely by file systems that store the tree in the page cache. > The only thing I can't quite figure out is how to get memory reclaim to > scan the extra address_space when it wants to try to reclaim pages. > That part ext4 and f2fs got for free because they stuffed the merkle > tree in the posteof space. Except for th magic swapper_spaces, and address_space without an inode is not a thing, so you need to allocate an extra inode anyway, which is what reclaim works on.
On Mon, Oct 16, 2023 at 10:57:45PM -0700, Christoph Hellwig wrote: > On Thu, Oct 12, 2023 at 12:27:46AM -0700, Eric Biggers wrote: > > Currently there are two options: PG_checked and the separate bitmap. I'm not > > yet convinced that removing the support for the PG_checked method is a good > > change. PG_checked is a nice solution for the cases where it can be used; it > > requires no extra memory, no locking, and has no max file size. Also, this > > change seems mostly orthogonal to what you're actually trying to accomplish. > > Given that willy has been on a (IMHO reasonable) quest to kill off > as many as possible page flags I'd really like to seize the opportunity > and kill PageCheck in fsverity. How big are the downsides of the bitmap > vs using the page flags, and do they matter in practice? > Currently PG_checked is used even with the bitmap-based approach, as a way to ensure that hash pages that get evicted and re-instantiated from the backing storage are re-verified. See is_hash_block_verified() in fs/verity/verify.c. That would need to be replaced with something else. I'm not sure what else would work and still be efficient. No one has actually deployed the bitmap-based approach in production yet; it was added in v6.3 and is only used for merkle_tree_block_size != PAGE_SIZE. But, the performance and memory overhead is probably not significant in practice. I'm more worried about the file size limit of the bitmap-based approach, which is currently ~4 TB. *Probably* no one is using fsverity on files that large, but introducing a limit for a case that wasn't supported before (merkle_tree_block_size != PAGE_SIZE) isn't as bad as retroactively introducing a limit for existing files that worked before and refusing to open them. Huge files would need something other than a simple bitmap, which would add complexity and overhead. Note that many filesystems use PG_checked for other purposes such as keeping track of when directory blocks have been validated. I'm not sure how feasible it will be to free up that flag entirely. - Eric
On Mon, Oct 16, 2023 at 09:58:34PM -0700, Eric Biggers wrote: > On Thu, Oct 12, 2023 at 08:12:09PM -0700, Darrick J. Wong wrote: > > Hi Eric, > > > > [Please excuse my ignorance, this is only the third time I've dived > > into fsverity.] > > > > On Thu, Oct 12, 2023 at 12:27:46AM -0700, Eric Biggers wrote: > > > On Wed, Oct 11, 2023 at 03:03:55PM +0200, Andrey Albershteyn wrote: > > > > > How complicated would it be to keep supporting using the page bit when > > > > > merkle_tree_block_size == page_size and the filesystem supports it? It's an > > > > > efficient solution, so it would be a shame to lose it. Also it doesn't have the > > > > > max file size limit that the bitmap has. > > > > How complex would it be to get rid of the bitmap entirely, and validate > > all the verity tree blocks within a page all at once instead of > > individual blocks within a page? > > > > Assuming willy isn't grinding his axe to get rid of PGchecked, > > obviously. ;) > > See what I wrote earlier at > https://lore.kernel.org/linux-xfs/Y5ltzp6yeMo1oDSk@sol.localdomain. > Basically it would increase the worst-case latency by a lot. Ahh. Yeah, ok, got it, you don't necessarily want to prefault a bunch more merkle tree blocks all at once just to read a single byte. :) > > > > > > Well, I think it's possible but my motivation was to step away from > > > > page manipulation as much as possible with intent to not affect other > > > > filesystems too much. I can probably add handling of this case to > > > > fsverity_read_merkle_tree_block() but fs/verity still will create > > > > bitmap and have a limit. The other way is basically revert changes > > > > done in patch 09, then, it probably will be quite a mix of page/block > > > > handling in fs/verity/verify.c > > > > > > The page-based caching still has to be supported anyway, since that's what the > > > other filesystems that support fsverity use, and it seems you don't plan to > > > change that. > > > > I frankly have been asking myself why /this/ patchset adds so much extra > > code and flags and whatnot to XFS and fs/verity. From what I can tell, > > the xfs buffer cache has been extended to allocate double the memory so > > that xattr contents can be shadowed. getxattr for merkle tree contents > > then pins the buffer, shadows the contents, and hands both back to the > > caller (aka xfs_read_merkle_tree_block). The shadow memory is then > > handed to fs/verity to do its magic; following that, fsverity releases > > the reference and we can eventually drop the xfs_buf reference. > > > > But this seems way overcomplicated to me. ->read_merkle_tree_page hands > > us a pgoff_t and a suggestion for page readahead, and wants us to return > > an uptodate locked page, right? > > > > Why can't xfs allocate a page, walk the requested range to fill the page > > with merkle tree blocks that were written to the xattr structure (or > > zero the page contents if there is no xattr), and hand that page to > > fsverity? (It helps to provide the merkle tree block size to > > xfs_read_merkle_tree_page, thanks for adding that). > > Earlier versions of this patchset did that. But, it's only really feasible if > the pages are actually cached. Otherwise it's very inefficient and can result > in random ENOMEM. > > > Assuming fsverity also wants some caching, we could augment the > > xfs_inode to point to a separate address_space for cached merkle tree > > pages, and then xfs_read_merkle_tree_page can use __filemap_get_folio to > > find uptodate cached pages, or instantiate one and make it uptodate. > > Even better, we can pretty easily use multipage folios for this, though > > AFAICT the fs/verity code isn't yet up to handling that. > > > > The only thing I can't quite figure out is how to get memory reclaim to > > scan the extra address_space when it wants to try to reclaim pages. > > That part ext4 and f2fs got for free because they stuffed the merkle > > tree in the posteof space. > > > > But wouldn't that solve /all/ the plumbing problems without scattering > > bits of new code and flags into the xfs buffer cache, the extended > > attributes code, and elsewhere? And then xfs would not need to burn up > > vmap space to allocate 8K memory blocks just to provide 4k merkel tree > > blocks to fs/verity. > > > > That's just my 2 cents from spending a couple of hours hypothesizing how > > I would fill out the fsverity_operations. > > That might work. I'm not sure about the details though, e.g. can mapping->host > point to the file's inode or would it need to be a fake one. Me neither. I /think/ invalidate_mapping_pages would do the job, though one would probably want to have a custom shrinker so we could find out just how many pages are desired by reclaim, and then try to invalidate only that many pages. Not sure what happens if there are multiple mappings whose ->host pointers point to the same inode. Alternately I wonder if we could make a tmpfs file that would evict live contents instead of writing them to the paging file... ;) --D > > - Eric
diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h index d071a6e32581..9611eeae3527 100644 --- a/fs/verity/fsverity_private.h +++ b/fs/verity/fsverity_private.h @@ -69,7 +69,6 @@ struct fsverity_info { u8 file_digest[FS_VERITY_MAX_DIGEST_SIZE]; const struct inode *inode; unsigned long *hash_block_verified; - spinlock_t hash_page_init_lock; }; #define FS_VERITY_MAX_SIGNATURE_SIZE (FS_VERITY_MAX_DESCRIPTOR_SIZE - \ diff --git a/fs/verity/open.c b/fs/verity/open.c index 6c31a871b84b..dfb9fe6aaae9 100644 --- a/fs/verity/open.c +++ b/fs/verity/open.c @@ -182,6 +182,7 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode, { struct fsverity_info *vi; int err; + unsigned long num_bits; vi = kmem_cache_zalloc(fsverity_info_cachep, GFP_KERNEL); if (!vi) @@ -213,33 +214,29 @@ struct fsverity_info *fsverity_create_info(const struct inode *inode, if (err) goto fail; - if (vi->tree_params.block_size != PAGE_SIZE) { - /* - * When the Merkle tree block size and page size differ, we use - * a bitmap to keep track of which hash blocks have been - * verified. This bitmap must contain one bit per hash block, - * including alignment to a page boundary at the end. - * - * Eventually, to support extremely large files in an efficient - * way, it might be necessary to make pages of this bitmap - * reclaimable. But for now, simply allocating the whole bitmap - * is a simple solution that works well on the files on which - * fsverity is realistically used. E.g., with SHA-256 and 4K - * blocks, a 100MB file only needs a 24-byte bitmap, and the - * bitmap for any file under 17GB fits in a 4K page. - */ - unsigned long num_bits = - vi->tree_params.tree_pages << - vi->tree_params.log_blocks_per_page; + /* + * We use a bitmap to keep track of which hash blocks have been + * verified. This bitmap must contain one bit per hash block, + * including alignment to a page boundary at the end. + * + * Eventually, to support extremely large files in an efficient + * way, it might be necessary to make pages of this bitmap + * reclaimable. But for now, simply allocating the whole bitmap + * is a simple solution that works well on the files on which + * fsverity is realistically used. E.g., with SHA-256 and 4K + * blocks, a 100MB file only needs a 24-byte bitmap, and the + * bitmap for any file under 17GB fits in a 4K page. + */ + num_bits = + vi->tree_params.tree_pages << + vi->tree_params.log_blocks_per_page; - vi->hash_block_verified = kvcalloc(BITS_TO_LONGS(num_bits), - sizeof(unsigned long), - GFP_KERNEL); - if (!vi->hash_block_verified) { - err = -ENOMEM; - goto fail; - } - spin_lock_init(&vi->hash_page_init_lock); + vi->hash_block_verified = kvcalloc(BITS_TO_LONGS(num_bits), + sizeof(unsigned long), + GFP_KERNEL); + if (!vi->hash_block_verified) { + err = -ENOMEM; + goto fail; } return vi; diff --git a/fs/verity/verify.c b/fs/verity/verify.c index 2fe7bd57b16e..e7b13d143ae9 100644 --- a/fs/verity/verify.c +++ b/fs/verity/verify.c @@ -13,69 +13,18 @@ static struct workqueue_struct *fsverity_read_workqueue; /* - * Returns true if the hash block with index @hblock_idx in the tree, located in - * @hpage, has already been verified. + * Returns true if the hash block with index @hblock_idx in the tree has + * already been verified. */ -static bool is_hash_block_verified(struct fsverity_info *vi, struct page *hpage, - unsigned long hblock_idx) +static bool is_hash_block_verified(struct fsverity_info *vi, + unsigned long hblock_idx, + bool block_cached) { - bool verified; - unsigned int blocks_per_page; - unsigned int i; - - /* - * When the Merkle tree block size and page size are the same, then the - * ->hash_block_verified bitmap isn't allocated, and we use PG_checked - * to directly indicate whether the page's block has been verified. - * - * Using PG_checked also guarantees that we re-verify hash pages that - * get evicted and re-instantiated from the backing storage, as new - * pages always start out with PG_checked cleared. - */ - if (!vi->hash_block_verified) - return PageChecked(hpage); - - /* - * When the Merkle tree block size and page size differ, we use a bitmap - * to indicate whether each hash block has been verified. - * - * However, we still need to ensure that hash pages that get evicted and - * re-instantiated from the backing storage are re-verified. To do - * this, we use PG_checked again, but now it doesn't really mean - * "checked". Instead, now it just serves as an indicator for whether - * the hash page is newly instantiated or not. - * - * The first thread that sees PG_checked=0 must clear the corresponding - * bitmap bits, then set PG_checked=1. This requires a spinlock. To - * avoid having to take this spinlock in the common case of - * PG_checked=1, we start with an opportunistic lockless read. - */ - if (PageChecked(hpage)) { - /* - * A read memory barrier is needed here to give ACQUIRE - * semantics to the above PageChecked() test. - */ - smp_rmb(); + if (block_cached) return test_bit(hblock_idx, vi->hash_block_verified); - } - spin_lock(&vi->hash_page_init_lock); - if (PageChecked(hpage)) { - verified = test_bit(hblock_idx, vi->hash_block_verified); - } else { - blocks_per_page = vi->tree_params.blocks_per_page; - hblock_idx = round_down(hblock_idx, blocks_per_page); - for (i = 0; i < blocks_per_page; i++) - clear_bit(hblock_idx + i, vi->hash_block_verified); - /* - * A write memory barrier is needed here to give RELEASE - * semantics to the below SetPageChecked() operation. - */ - smp_wmb(); - SetPageChecked(hpage); - verified = false; - } - spin_unlock(&vi->hash_page_init_lock); - return verified; + + clear_bit(hblock_idx, vi->hash_block_verified); + return false; } /* @@ -179,7 +128,7 @@ verify_data_block(struct inode *inode, struct fsverity_info *vi, goto error; } haddr = kmap_local_page(hpage) + hblock_offset_in_page; - if (is_hash_block_verified(vi, hpage, hblock_idx)) { + if (is_hash_block_verified(vi, hblock_idx, PageChecked(hpage))) { memcpy(_want_hash, haddr + hoffset, hsize); want_hash = _want_hash; kunmap_local(haddr);
The bitmap is used to track verified status of the Merkle tree blocks which are smaller than a PAGE. Blocks which fits exactly in a page - use PageChecked() for tracking "verified" status. This patch switches to always use bitmap to track verified status. This is needed to move fs-verity away from page management and work only with Merkle tree blocks. Also, this patch removes spinlock. The lock was used to reset bits in bitmap belonging to one page. This patch works only with one Merkle tree block and won't reset other blocks status. Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com> --- fs/verity/fsverity_private.h | 1 - fs/verity/open.c | 49 ++++++++++++------------- fs/verity/verify.c | 71 +++++------------------------------- 3 files changed, 33 insertions(+), 88 deletions(-)