Message ID | 257397e78b93bcb888daec01c1e02be87cd4dee7.1678097398.git.fdmanana@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix block group item corruption after inserting new block group | expand |
On 2023/3/6 18:13, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > We can often end up inserting a block group item, for a new block group, > with a wrong value for the used bytes field. > > This happens if for the new allocated block group, in the same transaction > that created the block group, we have tasks allocating extents from it as > well as tasks removing extents from it. > > For example: > > 1) Task A creates a metadata block group X; > > 2) Two extents are allocated from block group X, so its "used" field is > updated to 32K, and its "commit_used" field remains as 0; > > 3) Transaction commit starts, by some task B, and it enters > btrfs_start_dirty_block_groups(). There it tries to update the block > group item for block group X, which currently has its "used" field with > a value of 32K. But that fails since the block group item was not yet > inserted, and so on failure update_block_group_item() sets the > "commit_used" field of the block group back to 0; > > 4) The block group item is inserted by task A, when for example > btrfs_create_pending_block_groups() is called when releasing its > transaction handle. This results in insert_block_group_item() inserting > the block group item in the extent tree (or block group tree), with a > "used" field having a value of 32K, but without updating the > "commit_used" field in the block group, which remains with value of 0; > > 5) The two extents are freed from block X, so its "used" field changes > from 32K to 0; > > 6) The transaction commit by task B continues, it enters > btrfs_write_dirty_block_groups() which calls update_block_group_item() > for block group X, and there it decides to skip the block group item > update, because "used" has a value of 0 and "commit_used" has a value > of 0 too. > > As a result, we end up with a block item having a 32K "used" field but > no extents allocated from it. > > When this issue happens, a btrfs check reports an error like this: > > [1/7] checking root items > [2/7] checking extents > block group [1104150528 1073741824] used 39796736 but extent items used 0 > ERROR: errors found in extent allocation tree or chunk allocation > (...) > > Fix this by making insert_block_group_item() update the block group's > "commit_used" field. > > Fixes: 7248e0cebbef ("btrfs: skip update of block group item if used bytes are the same") > CC: stable@vger.kernel.org # 6.2+ > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks for pinning down the bug, I hit it several times randomly but didn't even consider it's related to that commit. The last several runs hitting can be found here in btrfs/073: https://h.anonymoususers.xyz:8443/results/details/btrfs-rock5b-global/2023-03-05T18:37:09/index.html Thanks, Qu > --- > fs/btrfs/block-group.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > index fd11d03a50ff..953e9547ca07 100644 > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -2493,18 +2493,29 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans, > struct btrfs_block_group_item bgi; > struct btrfs_root *root = btrfs_block_group_root(fs_info); > struct btrfs_key key; > + u64 old_commit_used; > + int ret; > > spin_lock(&block_group->lock); > btrfs_set_stack_block_group_used(&bgi, block_group->used); > btrfs_set_stack_block_group_chunk_objectid(&bgi, > block_group->global_root_id); > btrfs_set_stack_block_group_flags(&bgi, block_group->flags); > + old_commit_used = block_group->commit_used; > + block_group->commit_used = block_group->used; > key.objectid = block_group->start; > key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; > key.offset = block_group->length; > spin_unlock(&block_group->lock); > > - return btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi)); > + ret = btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi)); > + if (ret < 0) { > + spin_lock(&block_group->lock); > + block_group->commit_used = old_commit_used; > + spin_unlock(&block_group->lock); > + } > + > + return ret; > } > > static int insert_dev_extent(struct btrfs_trans_handle *trans,
On Mon, Mar 06, 2023 at 10:13:34AM +0000, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > We can often end up inserting a block group item, for a new block group, > with a wrong value for the used bytes field. > > This happens if for the new allocated block group, in the same transaction > that created the block group, we have tasks allocating extents from it as > well as tasks removing extents from it. > > For example: > > 1) Task A creates a metadata block group X; > > 2) Two extents are allocated from block group X, so its "used" field is > updated to 32K, and its "commit_used" field remains as 0; > > 3) Transaction commit starts, by some task B, and it enters > btrfs_start_dirty_block_groups(). There it tries to update the block > group item for block group X, which currently has its "used" field with > a value of 32K. But that fails since the block group item was not yet > inserted, and so on failure update_block_group_item() sets the > "commit_used" field of the block group back to 0; > > 4) The block group item is inserted by task A, when for example > btrfs_create_pending_block_groups() is called when releasing its > transaction handle. This results in insert_block_group_item() inserting > the block group item in the extent tree (or block group tree), with a > "used" field having a value of 32K, but without updating the > "commit_used" field in the block group, which remains with value of 0; > > 5) The two extents are freed from block X, so its "used" field changes > from 32K to 0; > > 6) The transaction commit by task B continues, it enters > btrfs_write_dirty_block_groups() which calls update_block_group_item() > for block group X, and there it decides to skip the block group item > update, because "used" has a value of 0 and "commit_used" has a value > of 0 too. > > As a result, we end up with a block item having a 32K "used" field but > no extents allocated from it. > > When this issue happens, a btrfs check reports an error like this: > > [1/7] checking root items > [2/7] checking extents > block group [1104150528 1073741824] used 39796736 but extent items used 0 > ERROR: errors found in extent allocation tree or chunk allocation > (...) > > Fix this by making insert_block_group_item() update the block group's > "commit_used" field. > > Fixes: 7248e0cebbef ("btrfs: skip update of block group item if used bytes are the same") > CC: stable@vger.kernel.org # 6.2+ > Signed-off-by: Filipe Manana <fdmanana@suse.com> Added to misc-next, thanks.
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index fd11d03a50ff..953e9547ca07 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2493,18 +2493,29 @@ static int insert_block_group_item(struct btrfs_trans_handle *trans, struct btrfs_block_group_item bgi; struct btrfs_root *root = btrfs_block_group_root(fs_info); struct btrfs_key key; + u64 old_commit_used; + int ret; spin_lock(&block_group->lock); btrfs_set_stack_block_group_used(&bgi, block_group->used); btrfs_set_stack_block_group_chunk_objectid(&bgi, block_group->global_root_id); btrfs_set_stack_block_group_flags(&bgi, block_group->flags); + old_commit_used = block_group->commit_used; + block_group->commit_used = block_group->used; key.objectid = block_group->start; key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; key.offset = block_group->length; spin_unlock(&block_group->lock); - return btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi)); + ret = btrfs_insert_item(trans, root, &key, &bgi, sizeof(bgi)); + if (ret < 0) { + spin_lock(&block_group->lock); + block_group->commit_used = old_commit_used; + spin_unlock(&block_group->lock); + } + + return ret; } static int insert_dev_extent(struct btrfs_trans_handle *trans,