Message ID | 20240701-b4-rst-updates-v3-1-e0437e1e04a6@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: rst: updates for RAID stripe tree | expand |
On Mon, Jul 01, 2024 at 12:25:15PM +0200, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > If we can't insert a stripe extent in the RAID stripe tree, because > the key that points to the specific position in the stripe tree is > already existing, we have to remove the item and then replace it by a > new item. > > This can happen for example on device replace operations. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c > index e6f7a234b8f6..3020820dd6e2 100644 > --- a/fs/btrfs/raid-stripe-tree.c > +++ b/fs/btrfs/raid-stripe-tree.c > @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le > return ret; > } > > +static int replace_raid_extent_item(struct btrfs_trans_handle *trans, > + struct btrfs_key *key, > + struct btrfs_stripe_extent *stripe_extent, > + const size_t item_size) > +{ > + struct btrfs_fs_info *fs_info = trans->fs_info; > + struct btrfs_root *stripe_root = fs_info->stripe_root; > + struct btrfs_path *path; > + int ret; > + > + path = btrfs_alloc_path(); > + if (!path) > + return -ENOMEM; > + > + ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1); > + if (ret) > + goto err; This will leak 1 and we'll get an awkward btrfs_abort_transaction() call. This should be if (ret) { ret = (ret == 1) ? -ENOENT : ret; goto err; } or whatever. Thanks, Josef
On 01.07.24 15:58, Josef Bacik wrote: > On Mon, Jul 01, 2024 at 12:25:15PM +0200, Johannes Thumshirn wrote: >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> >> If we can't insert a stripe extent in the RAID stripe tree, because >> the key that points to the specific position in the stripe tree is >> already existing, we have to remove the item and then replace it by a >> new item. >> >> This can happen for example on device replace operations. >> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> --- >> fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++ >> 1 file changed, 34 insertions(+) >> >> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c >> index e6f7a234b8f6..3020820dd6e2 100644 >> --- a/fs/btrfs/raid-stripe-tree.c >> +++ b/fs/btrfs/raid-stripe-tree.c >> @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le >> return ret; >> } >> >> +static int replace_raid_extent_item(struct btrfs_trans_handle *trans, >> + struct btrfs_key *key, >> + struct btrfs_stripe_extent *stripe_extent, >> + const size_t item_size) >> +{ >> + struct btrfs_fs_info *fs_info = trans->fs_info; >> + struct btrfs_root *stripe_root = fs_info->stripe_root; >> + struct btrfs_path *path; >> + int ret; >> + >> + path = btrfs_alloc_path(); >> + if (!path) >> + return -ENOMEM; >> + >> + ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1); >> + if (ret) >> + goto err; > > This will leak 1 and we'll get an awkward btrfs_abort_transaction() call. This > should be > > if (ret) { > ret = (ret == 1) ? -ENOENT : ret; > goto err; > } > > or whatever. Thanks, I wonder why I've never seen this in my testing. Could it be, that due to the fact that btrfs_insert_item() returns -EEXIST on the same key.objectid, we're more or less guaranteed it'll exist.
On Mon, Jul 01, 2024 at 03:08:22PM +0000, Johannes Thumshirn wrote: > On 01.07.24 15:58, Josef Bacik wrote: > > On Mon, Jul 01, 2024 at 12:25:15PM +0200, Johannes Thumshirn wrote: > >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > >> > >> If we can't insert a stripe extent in the RAID stripe tree, because > >> the key that points to the specific position in the stripe tree is > >> already existing, we have to remove the item and then replace it by a > >> new item. > >> > >> This can happen for example on device replace operations. > >> > >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > >> --- > >> fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++ > >> 1 file changed, 34 insertions(+) > >> > >> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c > >> index e6f7a234b8f6..3020820dd6e2 100644 > >> --- a/fs/btrfs/raid-stripe-tree.c > >> +++ b/fs/btrfs/raid-stripe-tree.c > >> @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le > >> return ret; > >> } > >> > >> +static int replace_raid_extent_item(struct btrfs_trans_handle *trans, > >> + struct btrfs_key *key, > >> + struct btrfs_stripe_extent *stripe_extent, > >> + const size_t item_size) > >> +{ > >> + struct btrfs_fs_info *fs_info = trans->fs_info; > >> + struct btrfs_root *stripe_root = fs_info->stripe_root; > >> + struct btrfs_path *path; > >> + int ret; > >> + > >> + path = btrfs_alloc_path(); > >> + if (!path) > >> + return -ENOMEM; > >> + > >> + ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1); > >> + if (ret) > >> + goto err; > > > > This will leak 1 and we'll get an awkward btrfs_abort_transaction() call. This > > should be > > > > if (ret) { > > ret = (ret == 1) ? -ENOENT : ret; > > goto err; > > } > > > > or whatever. Thanks, > > I wonder why I've never seen this in my testing. Could it be, that due > to the fact that btrfs_insert_item() returns -EEXIST on the same > key.objectid, we're more or less guaranteed it'll exist. Yeah it's fine in the way it is currently, but if anything changes in the future we're going to figure it out and be super sad we didn't just handle it right in the first place. Thanks, Josef
On Mon, Jul 01, 2024 at 12:25:15PM +0200, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > If we can't insert a stripe extent in the RAID stripe tree, because > the key that points to the specific position in the stripe tree is > already existing, we have to remove the item and then replace it by a > new item. > > This can happen for example on device replace operations. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++ > 1 file changed, 34 insertions(+) > > diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c > index e6f7a234b8f6..3020820dd6e2 100644 > --- a/fs/btrfs/raid-stripe-tree.c > +++ b/fs/btrfs/raid-stripe-tree.c > @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le > return ret; > } > > +static int replace_raid_extent_item(struct btrfs_trans_handle *trans, > + struct btrfs_key *key, > + struct btrfs_stripe_extent *stripe_extent, > + const size_t item_size) > +{ > + struct btrfs_fs_info *fs_info = trans->fs_info; > + struct btrfs_root *stripe_root = fs_info->stripe_root; > + struct btrfs_path *path; > + int ret; > + > + path = btrfs_alloc_path(); > + if (!path) > + return -ENOMEM; > + > + ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1); > + if (ret) > + goto err; > + > + ret = btrfs_del_item(trans, stripe_root, path); > + if (ret) > + goto err; > + > + btrfs_free_path(path); > + > + return btrfs_insert_item(trans, stripe_root, key, stripe_extent, > + item_size); > + err: > + btrfs_free_path(path); > + return ret; > +} > + > static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, > struct btrfs_io_context *bioc) > { > @@ -112,6 +143,9 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, > > ret = btrfs_insert_item(trans, stripe_root, &stripe_key, stripe_extent, > item_size); > + if (ret == -EEXIST) > + ret = replace_raid_extent_item(trans, &stripe_key, > + stripe_extent, item_size); I had another thought, how often is this particular thing happening? Bec ause we're doing 3 path allocations here in the worst case. If it happens more than say 10% of the time then we need to allocate a path once in btrfs_insert_one_raid_extent(), do the insert, and if it fails re-use that path to do the delete and insert the new one. Thanks, Josef
On 01.07.24 22:37, Josef Bacik wrote: >> + if (ret == -EEXIST) >> + ret = replace_raid_extent_item(trans, &stripe_key, >> + stripe_extent, item_size); > > I had another thought, how often is this particular thing happening? Bec ause > we're doing 3 path allocations here in the worst case. If it happens more than > say 10% of the time then we need to allocate a path once in > btrfs_insert_one_raid_extent(), do the insert, and if it fails re-use that path > to do the delete and insert the new one. Thanks, That indeed is a good question. I'll add some tracepoints to see how often this is getting called.
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c index e6f7a234b8f6..3020820dd6e2 100644 --- a/fs/btrfs/raid-stripe-tree.c +++ b/fs/btrfs/raid-stripe-tree.c @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le return ret; } +static int replace_raid_extent_item(struct btrfs_trans_handle *trans, + struct btrfs_key *key, + struct btrfs_stripe_extent *stripe_extent, + const size_t item_size) +{ + struct btrfs_fs_info *fs_info = trans->fs_info; + struct btrfs_root *stripe_root = fs_info->stripe_root; + struct btrfs_path *path; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1); + if (ret) + goto err; + + ret = btrfs_del_item(trans, stripe_root, path); + if (ret) + goto err; + + btrfs_free_path(path); + + return btrfs_insert_item(trans, stripe_root, key, stripe_extent, + item_size); + err: + btrfs_free_path(path); + return ret; +} + static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, struct btrfs_io_context *bioc) { @@ -112,6 +143,9 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans, ret = btrfs_insert_item(trans, stripe_root, &stripe_key, stripe_extent, item_size); + if (ret == -EEXIST) + ret = replace_raid_extent_item(trans, &stripe_key, + stripe_extent, item_size); if (ret) btrfs_abort_transaction(trans, ret);