Message ID | cover.1677750131.git.johannes.thumshirn@wdc.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: introduce RAID stripe tree | expand |
On Thu, Mar 2, 2023 at 4:56 AM Johannes Thumshirn <johannes.thumshirn@wdc.com> wrote: > > Updates of the raid-stripe-tree are done at delayed-ref time to safe on > bandwidth while for reading we do the stripe-tree lookup on bio mapping time, > i.e. when the logical to physical translation happens for regular btrfs RAID > as well. > > The stripe tree is keyed by an extent's disk_bytenr and disk_num_bytes and > it's contents are the respective physical device id and position. > > For an example 1M write (split into 126K segments due to zone-append) > rapido2:/home/johannes/src/fstests# xfs_io -fdc "pwrite -b 1M 0 1M" -c fsync /mnt/test/test > wrote 1048576/1048576 bytes at offset 0 > 1 MiB, 1 ops; 0.0065 sec (151.538 MiB/sec and 151.5381 ops/sec) > > The tree will look as follows: > > rapido2:/home/johannes/src/fstests# btrfs inspect-internal dump-tree -t raid_stripe /dev/nullb0 > btrfs-progs v5.16.1 > raid stripe tree key (RAID_STRIPE_TREE ROOT_ITEM 0) > leaf 805847040 items 9 free space 15770 generation 9 owner RAID_STRIPE_TREE > leaf 805847040 flags 0x1(WRITTEN) backref revision 1 > checksum stored 1b22e13800000000000000000000000000000000000000000000000000000000 > checksum calced 1b22e13800000000000000000000000000000000000000000000000000000000 > fs uuid e4f523d1-89a1-41f9-ab75-6ba3c42a28fb > chunk uuid 6f2d8aaa-d348-4bf2-9b5e-141a37ba4c77 > item 0 key (939524096 RAID_STRIPE_KEY 126976) itemoff 16251 itemsize 32 > stripe 0 devid 1 offset 939524096 > stripe 1 devid 2 offset 536870912 > item 1 key (939651072 RAID_STRIPE_KEY 126976) itemoff 16219 itemsize 32 > stripe 0 devid 1 offset 939651072 > stripe 1 devid 2 offset 536997888 > item 2 key (939778048 RAID_STRIPE_KEY 126976) itemoff 16187 itemsize 32 > stripe 0 devid 1 offset 939778048 > stripe 1 devid 2 offset 537124864 > item 3 key (939905024 RAID_STRIPE_KEY 126976) itemoff 16155 itemsize 32 > stripe 0 devid 1 offset 939905024 > stripe 1 devid 2 offset 537251840 > item 4 key (940032000 RAID_STRIPE_KEY 126976) itemoff 16123 itemsize 32 > stripe 0 devid 1 offset 940032000 > stripe 1 devid 2 offset 537378816 > item 5 key (940158976 RAID_STRIPE_KEY 126976) itemoff 16091 itemsize 32 > stripe 0 devid 1 offset 940158976 > stripe 1 devid 2 offset 537505792 > item 6 key (940285952 RAID_STRIPE_KEY 126976) itemoff 16059 itemsize 32 > stripe 0 devid 1 offset 940285952 > stripe 1 devid 2 offset 537632768 > item 7 key (940412928 RAID_STRIPE_KEY 126976) itemoff 16027 itemsize 32 > stripe 0 devid 1 offset 940412928 > stripe 1 devid 2 offset 537759744 > item 8 key (940539904 RAID_STRIPE_KEY 32768) itemoff 15995 itemsize 32 > stripe 0 devid 1 offset 940539904 > stripe 1 devid 2 offset 537886720 > total bytes 26843545600 > bytes used 1245184 > uuid e4f523d1-89a1-41f9-ab75-6ba3c42a28fb > > A design document can be found here: > https://docs.google.com/document/d/1Iui_jMidCd4MVBNSSLXRfO7p5KmvnoQL/edit?usp=sharing&ouid=103609947580185458266&rtpof=true&sd=true > > The user-space part of this series can be found here: > https://lore.kernel.org/linux-btrfs/20230215143109.2721722-1-johannes.thumshirn@wdc.com > Apologies if this is a stupid question, but after reading through the patch series and the design document, it sounds like the crux of this change is switching how RAID works to be COW like everything else. Does that also mean RAID 56 modes benefit from this in that manner? -- 真実はいつも一つ!/ Always, there's only one truth!
On 02.03.23 20:39, Neal Gompa wrote: >> A design document can be found here: >> https://docs.google.com/document/d/1Iui_jMidCd4MVBNSSLXRfO7p5KmvnoQL/edit?usp=sharing&ouid=103609947580185458266&rtpof=true&sd=true >> >> The user-space part of this series can be found here: >> https://lore.kernel.org/linux-btrfs/20230215143109.2721722-1-johannes.thumshirn@wdc.com >> > > Apologies if this is a stupid question, but after reading through the > patch series and the design document, it sounds like the crux of this > change is switching how RAID works to be COW like everything else. > Does that also mean RAID 56 modes benefit from this in that manner? > Yep that is the intention once I get far enough to have RAID56 covered. But this is going to be the next milestone after having RAID0/1/10 done and working properly for zoned.
Is there a plan to rebase this series to the latest misc-next branch? Unfortunately, applying this patch fails at multiple patches. Thanks, Anand On 02/03/2023 17:45, Johannes Thumshirn wrote: > Updates of the raid-stripe-tree are done at delayed-ref time to safe on > bandwidth while for reading we do the stripe-tree lookup on bio mapping time, > i.e. when the logical to physical translation happens for regular btrfs RAID > as well. > > The stripe tree is keyed by an extent's disk_bytenr and disk_num_bytes and > it's contents are the respective physical device id and position. > > For an example 1M write (split into 126K segments due to zone-append) > rapido2:/home/johannes/src/fstests# xfs_io -fdc "pwrite -b 1M 0 1M" -c fsync /mnt/test/test > wrote 1048576/1048576 bytes at offset 0 > 1 MiB, 1 ops; 0.0065 sec (151.538 MiB/sec and 151.5381 ops/sec) > > The tree will look as follows: > > rapido2:/home/johannes/src/fstests# btrfs inspect-internal dump-tree -t raid_stripe /dev/nullb0 > btrfs-progs v5.16.1 > raid stripe tree key (RAID_STRIPE_TREE ROOT_ITEM 0) > leaf 805847040 items 9 free space 15770 generation 9 owner RAID_STRIPE_TREE > leaf 805847040 flags 0x1(WRITTEN) backref revision 1 > checksum stored 1b22e13800000000000000000000000000000000000000000000000000000000 > checksum calced 1b22e13800000000000000000000000000000000000000000000000000000000 > fs uuid e4f523d1-89a1-41f9-ab75-6ba3c42a28fb > chunk uuid 6f2d8aaa-d348-4bf2-9b5e-141a37ba4c77 > item 0 key (939524096 RAID_STRIPE_KEY 126976) itemoff 16251 itemsize 32 > stripe 0 devid 1 offset 939524096 > stripe 1 devid 2 offset 536870912 > item 1 key (939651072 RAID_STRIPE_KEY 126976) itemoff 16219 itemsize 32 > stripe 0 devid 1 offset 939651072 > stripe 1 devid 2 offset 536997888 > item 2 key (939778048 RAID_STRIPE_KEY 126976) itemoff 16187 itemsize 32 > stripe 0 devid 1 offset 939778048 > stripe 1 devid 2 offset 537124864 > item 3 key (939905024 RAID_STRIPE_KEY 126976) itemoff 16155 itemsize 32 > stripe 0 devid 1 offset 939905024 > stripe 1 devid 2 offset 537251840 > item 4 key (940032000 RAID_STRIPE_KEY 126976) itemoff 16123 itemsize 32 > stripe 0 devid 1 offset 940032000 > stripe 1 devid 2 offset 537378816 > item 5 key (940158976 RAID_STRIPE_KEY 126976) itemoff 16091 itemsize 32 > stripe 0 devid 1 offset 940158976 > stripe 1 devid 2 offset 537505792 > item 6 key (940285952 RAID_STRIPE_KEY 126976) itemoff 16059 itemsize 32 > stripe 0 devid 1 offset 940285952 > stripe 1 devid 2 offset 537632768 > item 7 key (940412928 RAID_STRIPE_KEY 126976) itemoff 16027 itemsize 32 > stripe 0 devid 1 offset 940412928 > stripe 1 devid 2 offset 537759744 > item 8 key (940539904 RAID_STRIPE_KEY 32768) itemoff 15995 itemsize 32 > stripe 0 devid 1 offset 940539904 > stripe 1 devid 2 offset 537886720 > total bytes 26843545600 > bytes used 1245184 > uuid e4f523d1-89a1-41f9-ab75-6ba3c42a28fb > > A design document can be found here: > https://docs.google.com/document/d/1Iui_jMidCd4MVBNSSLXRfO7p5KmvnoQL/edit?usp=sharing&ouid=103609947580185458266&rtpof=true&sd=true > > The user-space part of this series can be found here: > https://lore.kernel.org/linux-btrfs/20230215143109.2721722-1-johannes.thumshirn@wdc.com > > Changes to v6: > - Fix degraded RAID1 mounts > - Fix RAID0/10 mounts > > v6 of the patchset can be found here: > https://lore/kernel.org/linux-btrfs/cover.1676470614.git.johannes.thumshirn@wdc.com > > Changes to v5: > - Incroporated review comments from Josef and Christoph > - Rebased onto misc-next > > v5 of the patchset can be found here: > https://lore/kernel.org/linux-btrfs/cover.1675853489.git.johannes.thumshirn@wdc.com > > Changes to v4: > - Added patch to check for RST feature in sysfs > - Added RST lookups for scrubbing > - Fixed the error handling bug Josef pointed out > - Only check if we need to write out a RST once per delayed_ref head > - Added support for zoned data DUP with RST > > Changes to v3: > - Rebased onto 20221120124734.18634-1-hch@lst.de > - Incorporated Josef's review > - Merged related patches > > v3 of the patchset can be found here: > https://lore/kernel.org/linux-btrfs/cover.1666007330.git.johannes.thumshirn@wdc.com > > Changes to v2: > - Bug fixes > - Rebased onto 20220901074216.1849941-1-hch@lst.de > - Added tracepoints > - Added leak checker > - Added RAID0 and RAID10 > > v2 of the patchset can be found here: > https://lore.kernel.org/linux-btrfs/cover.1656513330.git.johannes.thumshirn@wdc.com > > Changes to v1: > - Write the stripe-tree at delayed-ref time (Qu) > - Add a different write path for preallocation > > v1 of the patchset can be found here: > https://lore.kernel.org/linux-btrfs/cover.1652711187.git.johannes.thumshirn@wdc.com/ > > Johannes Thumshirn (13): > btrfs: re-add trans parameter to insert_delayed_ref > btrfs: add raid stripe tree definitions > btrfs: read raid-stripe-tree from disk > btrfs: add support for inserting raid stripe extents > btrfs: delete stripe extent on extent deletion > btrfs: lookup physical address from stripe extent > btrfs: add raid stripe tree pretty printer > btrfs: zoned: allow zoned RAID > btrfs: check for leaks of ordered stripes on umount > btrfs: add tracepoints for ordered stripes > btrfs: announce presence of raid-stripe-tree in sysfs > btrfs: consult raid-stripe-tree when scrubbing > btrfs: add raid-stripe-tree to features enabled with debug > > fs/btrfs/Makefile | 2 +- > fs/btrfs/accessors.h | 29 +++ > fs/btrfs/bio.c | 29 +++ > fs/btrfs/block-rsv.c | 1 + > fs/btrfs/delayed-ref.c | 13 +- > fs/btrfs/delayed-ref.h | 2 + > fs/btrfs/disk-io.c | 24 ++ > fs/btrfs/disk-io.h | 5 + > fs/btrfs/extent-tree.c | 68 ++++++ > fs/btrfs/fs.h | 7 +- > fs/btrfs/inode.c | 15 +- > fs/btrfs/print-tree.c | 21 ++ > fs/btrfs/raid-stripe-tree.c | 416 ++++++++++++++++++++++++++++++++ > fs/btrfs/raid-stripe-tree.h | 87 +++++++ > fs/btrfs/scrub.c | 33 ++- > fs/btrfs/super.c | 1 + > fs/btrfs/sysfs.c | 3 + > fs/btrfs/volumes.c | 46 +++- > fs/btrfs/volumes.h | 13 +- > fs/btrfs/zoned.c | 119 ++++++++- > include/trace/events/btrfs.h | 50 ++++ > include/uapi/linux/btrfs.h | 1 + > include/uapi/linux/btrfs_tree.h | 20 +- > 23 files changed, 973 insertions(+), 32 deletions(-) > create mode 100644 fs/btrfs/raid-stripe-tree.c > create mode 100644 fs/btrfs/raid-stripe-tree.h >
On 03.03.23 10:30, Anand Jain wrote: > > Is there a plan to rebase this series to the latest misc-next branch? > Unfortunately, applying this patch fails at multiple patches. > Will do. I messed up my latest rebase anyways so thanks for noticing it.