Message ID | 20240711-b4-rst-updates-v2-3-d7b8113d88b7@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: more RAID stripe tree updates | expand |
在 2024/7/11 15:51, Johannes Thumshirn 写道: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > btrfs_delete_raid_extent() was written under the assumption, that it's > call-chain always passes a start, length tuple that matches a single > extent. But btrfs_delete_raid_extent() is called by > do_free_extent_acounting() which in term is called by > __btrfs_free_extent(). But from the call site __btrfs_free_extent(), it is still called for a single extent. Or we will get an error and abort the current transaction. > > But this call-chain passes in a start address and a length that can > possibly match multiple on-disk extents. Mind to give a more detailed example on this? Thanks, Qu > > To make this possible, we have to adjust the start and length of each > btree node lookup, to not delete beyond the requested range. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/raid-stripe-tree.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c > index fd56535b2289..6f65be334637 100644 > --- a/fs/btrfs/raid-stripe-tree.c > +++ b/fs/btrfs/raid-stripe-tree.c > @@ -66,6 +66,11 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le > if (ret) > break; > > + start += key.offset; > + length -= key.offset; > + if (length == 0) > + break; > + > btrfs_release_path(path); > } > >
在 2024/7/11 16:25, Qu Wenruo 写道: > > > 在 2024/7/11 15:51, Johannes Thumshirn 写道: >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> >> btrfs_delete_raid_extent() was written under the assumption, that it's >> call-chain always passes a start, length tuple that matches a single >> extent. But btrfs_delete_raid_extent() is called by >> do_free_extent_acounting() which in term is called by > >> __btrfs_free_extent(). > > But from the call site __btrfs_free_extent(), it is still called for a > single extent. > > Or we will get an error and abort the current transaction. Or does it mean, one data extent can have multiple RST entries? Is that a non-zoned RST specific behavior? As I still remember that we split ordered extents for zoned devices, so that it should always have one extent for each split OE. Thanks, Qu > >> >> But this call-chain passes in a start address and a length that can >> possibly match multiple on-disk extents. > > Mind to give a more detailed example on this? > > Thanks, > Qu > >> >> To make this possible, we have to adjust the start and length of each >> btree node lookup, to not delete beyond the requested range. >> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> --- >> fs/btrfs/raid-stripe-tree.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c >> index fd56535b2289..6f65be334637 100644 >> --- a/fs/btrfs/raid-stripe-tree.c >> +++ b/fs/btrfs/raid-stripe-tree.c >> @@ -66,6 +66,11 @@ int btrfs_delete_raid_extent(struct >> btrfs_trans_handle *trans, u64 start, u64 le >> if (ret) >> break; >> + start += key.offset; >> + length -= key.offset; >> + if (length == 0) >> + break; >> + >> btrfs_release_path(path); >> } >> >
在 2024/7/11 17:14, Qu Wenruo 写道: > > > 在 2024/7/11 16:25, Qu Wenruo 写道: >> >> >> 在 2024/7/11 15:51, Johannes Thumshirn 写道: >>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >>> >>> btrfs_delete_raid_extent() was written under the assumption, that it's >>> call-chain always passes a start, length tuple that matches a single >>> extent. But btrfs_delete_raid_extent() is called by >>> do_free_extent_acounting() which in term is called by > >>> __btrfs_free_extent(). >> >> But from the call site __btrfs_free_extent(), it is still called for a >> single extent. >> >> Or we will get an error and abort the current transaction. > > Or does it mean, one data extent can have multiple RST entries? > > Is that a non-zoned RST specific behavior? > As I still remember that we split ordered extents for zoned devices, so > that it should always have one extent for each split OE. OK, it's indeed an RST specific behavior (at least for RST with non-zoned devices). I can have the following layout: item 15 key (258 EXTENT_DATA 419430400) itemoff 15306 itemsize 53 generation 10 type 1 (regular) extent data disk byte 1808793600 nr 117440512 extent data offset 0 nr 117440512 ram 117440512 extent compression 0 (none) Which is a large data extent with 112MiB length. Meanwhile for the RST entries there are 3 split ones: item 13 key (1808793600 RAID_STRIPE 33619968) itemoff 15835 itemsize 32 stripe 0 devid 2 physical 1787822080 stripe 1 devid 1 physical 1808793600 item 14 key (1842413568 RAID_STRIPE 58789888) itemoff 15803 itemsize 32 stripe 0 devid 2 physical 1821442048 stripe 1 devid 1 physical 1842413568 item 15 key (1901203456 RAID_STRIPE 25030656) itemoff 15771 itemsize 32 stripe 0 devid 2 physical 1880231936 stripe 1 devid 1 physical 1901203456 So yes, it's possible to have multiple RST entries for a single data extent, it's no longer the old zoned behavior. In that case, the patch looks fine to me. Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > > Thanks, > Qu >> >>> >>> But this call-chain passes in a start address and a length that can >>> possibly match multiple on-disk extents. >> >> Mind to give a more detailed example on this? >> >> Thanks, >> Qu >> >>> >>> To make this possible, we have to adjust the start and length of each >>> btree node lookup, to not delete beyond the requested range. >>> >>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >>> --- >>> fs/btrfs/raid-stripe-tree.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c >>> index fd56535b2289..6f65be334637 100644 >>> --- a/fs/btrfs/raid-stripe-tree.c >>> +++ b/fs/btrfs/raid-stripe-tree.c >>> @@ -66,6 +66,11 @@ int btrfs_delete_raid_extent(struct >>> btrfs_trans_handle *trans, u64 start, u64 le >>> if (ret) >>> break; >>> + start += key.offset; >>> + length -= key.offset; >>> + if (length == 0) >>> + break; >>> + >>> btrfs_release_path(path); >>> } >>> >> >
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c index fd56535b2289..6f65be334637 100644 --- a/fs/btrfs/raid-stripe-tree.c +++ b/fs/btrfs/raid-stripe-tree.c @@ -66,6 +66,11 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le if (ret) break; + start += key.offset; + length -= key.offset; + if (length == 0) + break; + btrfs_release_path(path); }