Message ID | cover.1728608421.git.anand.jain@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | raid1 balancing methods | expand |
On 11/10/24 8:19 am, Anand Jain wrote: > v2: > 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG. > 2. Correct the typo from %est_wait to %best_wait. > 3. Initialize %best_wait to U64_MAX and remove the check for 0. > 4. Implement rotation with a minimum contiguous read threshold before > switching to the next stripe. Configure this, using: > > echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy > > The default value is the sector size, and the min_contiguous_read > value must be a multiple of the sector size. > > 5. Tested FIO random read/write and defrag compression workloads with > min_contiguous_read set to sector size, 192k, and 256k. > > RAID1 balancing method rotation is better for multi-process workloads > such as fio and also single-process workload such as defragmentation. > > $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \ > --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \ > --time_based --group_reporting --name=iops-test-job --eta-newline=1 > > > | | | | Read I/O count | > | | Read | Write | devid1 | devid2 | > |---------|------------|------------|--------|--------| > | pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 | > | rotation| | | | | > | 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 | > | 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 | > | 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 | > | latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 | > | devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 | > > rotation RAID1 balancing technique performs more than 2x better for > single-process defrag. > > $ time -p btrfs filesystem defrag -r -f -c /btrfs > > > | | Time | Read I/O Count | > | | Real | devid1 | devid2 | > |---------|-------|--------|--------| > | pid | 18.00s| 3800 | 0 | > | rotation| | | | > | 4096| 8.95s| 1900 | 1901 | > | 196608| 8.50s| 1881 | 1919 | > | 262144| 8.80s| 1881 | 1919 | > | latency | 17.18s| 3800 | 0 | > | devid:2 | 17.48s| 0 | 3800 | > Copy and paste error. Please ignore the below paragraph. Thx. ---vvv--- ignore ---vvv---- > Rotation keeps all devices active, and for now, the Rotation RAID1 > balancing method is preferable as default. More workload testing is > needed while the code is EXPERIMENTAL. > While Latency is better during the failing/unstable block layer transport. > As of now these two techniques, are needed to be further independently > tested with different worloads, and in the long term we should be merge > these technique to a unified heuristic. ---^^^------------^^^------ > Rotation keeps all devices active, and for now, the Rotation RAID1 > balancing method should be the default. More workload testing is needed > while the code is EXPERIMENTAL. > > Latency is smarter with unstable block layer transport. > > Both techniques need independent testing across workloads, with the goal of > eventually merging them into a unified approach? for the long term. > > Devid is a hands-on approach, provides manual or user-space script control. > > These RAID1 balancing methods are tunable via the sysfs knob. > The mount -o option and btrfs properties are under consideration. > > Thx. > > --------- original v1 ------------ > > The RAID1-balancing methods helps distribute read I/O across devices, and > this patch introduces three balancing methods: rotation, latency, and > devid. These methods are enabled under the `CONFIG_BTRFS_DEBUG` config > option and are on top of the previously added > `/sys/fs/btrfs/<UUID>/read_policy` interface to configure the desired > RAID1 read balancing method. > > I've tested these patches using fio and filesystem defragmentation > workloads on a two-device RAID1 setup (with both data and metadata > mirrored across identical devices). I tracked device read counts by > extracting stats from `/sys/devices/<..>/stat` for each device. Below is > a summary of the results, with each result the average of three > iterations. > > A typical generic random rw workload: > > $ fio --filename=/btrfs/foo --size=10Gi --direct=1 --rw=randrw --bs=4k \ > --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based \ > --group_reporting --name=iops-test-job --eta-newline=1 > > | | | | Read I/O count | > | | Read | Write | devid1 | devid2 | > |---------|------------|------------|--------|--------| > | pid | 29.4MiB/s | 29.5MiB/s | 456548 | 447975 | > | rotation| 29.3MiB/s | 29.3MiB/s | 450105 | 450055 | > | latency | 21.9MiB/s | 21.9MiB/s | 672387 | 0 | > | devid:1 | 22.0MiB/s | 22.0MiB/s | 674788 | 0 | > > Defragmentation with compression workload: > > $ xfs_io -f -d -c 'pwrite -S 0xab 0 1G' /btrfs/foo > $ sync > $ echo 3 > /proc/sys/vm/drop_caches > $ btrfs filesystem defrag -f -c /btrfs/foo > > | | Time | Read I/O Count | > | | Real | devid1 | devid2 | > |---------|-------|--------|--------| > | pid | 21.61s| 3810 | 0 | > | rotation| 11.55s| 1905 | 1905 | > | latency | 20.99s| 0 | 3810 | > | devid:2 | 21.41s| 0 | 3810 | > > . The PID-based balancing method works well for the generic random rw fio > workload. > . The rotation method is ideal when you want to keep both devices active, > and it boosts performance in sequential defragmentation scenarios. > . The latency-based method work well when we have mixed device types or > when one device experiences intermittent I/O failures the latency > increases and it automatically picks the other device for further Read > IOs. > . The devid method is a more hands-on approach, useful for diagnosing and > testing RAID1 mirror synchronizations. > > Anand Jain (3): > btrfs: introduce RAID1 round-robin read balancing > btrfs: use the path with the lowest latency for RAID1 reads > btrfs: add RAID1 preferred read device > > fs/btrfs/disk-io.c | 4 ++ > fs/btrfs/sysfs.c | 116 +++++++++++++++++++++++++++++++++++++++------ > fs/btrfs/volumes.c | 109 ++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/volumes.h | 16 +++++++ > 4 files changed, 230 insertions(+), 15 deletions(-) >
在 2024/10/11 13:19, Anand Jain 写道: > v2: > 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG. > 2. Correct the typo from %est_wait to %best_wait. > 3. Initialize %best_wait to U64_MAX and remove the check for 0. > 4. Implement rotation with a minimum contiguous read threshold before > switching to the next stripe. Configure this, using: > > echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy > > The default value is the sector size, and the min_contiguous_read > value must be a multiple of the sector size. Overall, I'm fine with the latency and preferred device policies. Meanwhile I'd prefer the previous version of round-robin, without the min_contiguous_read. That looks a little overkilled, and I think we should keep the policy as simple as possible for now. Mind to share why the min_contiguous_read is introduced in this update? In the future, we should go the same method as sched_ext, by pushing the complex policies to eBPF programs. Another future improvement is the interface, I'm fine with the sysfs knob for an experimental feature. But from my last drop_subtree_threshold experience, sysfs is not going to be a user-friendly interface. It really relies on some user space daemon to set. I'd prefer something more persistent, like some XATTR but inside root tree, and go with prop interfaces. But that can all be done in the future. Thanks, Qu > > 5. Tested FIO random read/write and defrag compression workloads with > min_contiguous_read set to sector size, 192k, and 256k. > > RAID1 balancing method rotation is better for multi-process workloads > such as fio and also single-process workload such as defragmentation. > > $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \ > --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \ > --time_based --group_reporting --name=iops-test-job --eta-newline=1 > > > | | | | Read I/O count | > | | Read | Write | devid1 | devid2 | > |---------|------------|------------|--------|--------| > | pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 | > | rotation| | | | | > | 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 | > | 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 | > | 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 | > | latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 | > | devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 | > > rotation RAID1 balancing technique performs more than 2x better for > single-process defrag. > > $ time -p btrfs filesystem defrag -r -f -c /btrfs > > > | | Time | Read I/O Count | > | | Real | devid1 | devid2 | > |---------|-------|--------|--------| > | pid | 18.00s| 3800 | 0 | > | rotation| | | | > | 4096| 8.95s| 1900 | 1901 | > | 196608| 8.50s| 1881 | 1919 | > | 262144| 8.80s| 1881 | 1919 | > | latency | 17.18s| 3800 | 0 | > | devid:2 | 17.48s| 0 | 3800 | > > Rotation keeps all devices active, and for now, the Rotation RAID1 > balancing method is preferable as default. More workload testing is > needed while the code is EXPERIMENTAL. > While Latency is better during the failing/unstable block layer transport. > As of now these two techniques, are needed to be further independently > tested with different worloads, and in the long term we should be merge > these technique to a unified heuristic. > > Rotation keeps all devices active, and for now, the Rotation RAID1 > balancing method should be the default. More workload testing is needed > while the code is EXPERIMENTAL. > > Latency is smarter with unstable block layer transport. > > Both techniques need independent testing across workloads, with the goal of > eventually merging them into a unified approach? for the long term. > > Devid is a hands-on approach, provides manual or user-space script control. > > These RAID1 balancing methods are tunable via the sysfs knob. > The mount -o option and btrfs properties are under consideration. > > Thx. > > --------- original v1 ------------ > > The RAID1-balancing methods helps distribute read I/O across devices, and > this patch introduces three balancing methods: rotation, latency, and > devid. These methods are enabled under the `CONFIG_BTRFS_DEBUG` config > option and are on top of the previously added > `/sys/fs/btrfs/<UUID>/read_policy` interface to configure the desired > RAID1 read balancing method. > > I've tested these patches using fio and filesystem defragmentation > workloads on a two-device RAID1 setup (with both data and metadata > mirrored across identical devices). I tracked device read counts by > extracting stats from `/sys/devices/<..>/stat` for each device. Below is > a summary of the results, with each result the average of three > iterations. > > A typical generic random rw workload: > > $ fio --filename=/btrfs/foo --size=10Gi --direct=1 --rw=randrw --bs=4k \ > --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based \ > --group_reporting --name=iops-test-job --eta-newline=1 > > | | | | Read I/O count | > | | Read | Write | devid1 | devid2 | > |---------|------------|------------|--------|--------| > | pid | 29.4MiB/s | 29.5MiB/s | 456548 | 447975 | > | rotation| 29.3MiB/s | 29.3MiB/s | 450105 | 450055 | > | latency | 21.9MiB/s | 21.9MiB/s | 672387 | 0 | > | devid:1 | 22.0MiB/s | 22.0MiB/s | 674788 | 0 | > > Defragmentation with compression workload: > > $ xfs_io -f -d -c 'pwrite -S 0xab 0 1G' /btrfs/foo > $ sync > $ echo 3 > /proc/sys/vm/drop_caches > $ btrfs filesystem defrag -f -c /btrfs/foo > > | | Time | Read I/O Count | > | | Real | devid1 | devid2 | > |---------|-------|--------|--------| > | pid | 21.61s| 3810 | 0 | > | rotation| 11.55s| 1905 | 1905 | > | latency | 20.99s| 0 | 3810 | > | devid:2 | 21.41s| 0 | 3810 | > > . The PID-based balancing method works well for the generic random rw fio > workload. > . The rotation method is ideal when you want to keep both devices active, > and it boosts performance in sequential defragmentation scenarios. > . The latency-based method work well when we have mixed device types or > when one device experiences intermittent I/O failures the latency > increases and it automatically picks the other device for further Read > IOs. > . The devid method is a more hands-on approach, useful for diagnosing and > testing RAID1 mirror synchronizations. > > Anand Jain (3): > btrfs: introduce RAID1 round-robin read balancing > btrfs: use the path with the lowest latency for RAID1 reads > btrfs: add RAID1 preferred read device > > fs/btrfs/disk-io.c | 4 ++ > fs/btrfs/sysfs.c | 116 +++++++++++++++++++++++++++++++++++++++------ > fs/btrfs/volumes.c | 109 ++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/volumes.h | 16 +++++++ > 4 files changed, 230 insertions(+), 15 deletions(-) >
On 11/10/24 10:29 am, Qu Wenruo wrote: > > > 在 2024/10/11 13:19, Anand Jain 写道: >> v2: >> 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of >> CONFIG_BTRFS_DEBUG. >> 2. Correct the typo from %est_wait to %best_wait. >> 3. Initialize %best_wait to U64_MAX and remove the check for 0. >> 4. Implement rotation with a minimum contiguous read threshold before >> switching to the next stripe. Configure this, using: >> >> echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/ >> read_policy >> >> The default value is the sector size, and the min_contiguous_read >> value must be a multiple of the sector size. > > Overall, I'm fine with the latency and preferred device policies. > > Meanwhile I'd prefer the previous version of round-robin, without the > min_contiguous_read. > > That looks a little overkilled, and I think we should keep the policy as > simple as possible for now. > > Mind to share why the min_contiguous_read is introduced in this update? > The reason for adding min_contiguous_read: The block layer optimizes with bio merging to improve HDD performance. David mentioned on Slack that 192k to 256k co-reads can performance better, though I haven't seen this in my setup but it may work in others. > In the future, we should go the same method as sched_ext, by pushing the > complex policies to eBPF programs. External scripts for RAID1 balancing are achievable with BPF, though it require writable BPF, which is disabled in some cases. That said, still we should prioritize adding support and provide choice to the use-case to decide. > Another future improvement is the interface, I'm fine with the sysfs > knob for an experimental feature. Yes, we need to review tunables - mount options, sysfs, and btrfs properties to have a clear guidelines/consolidation. > But from my last drop_subtree_threshold experience, sysfs is not going > to be a user-friendly interface. It really relies on some user space > daemon to set. > Agreed. However, for Btrfs, sysfs has been the most comprehensive so far. > I'd prefer something more persistent, like some XATTR but inside root > tree, and go with prop interfaces. > But that can all be done in the future. > Absolutely. I included that in earlier experiments, but it was removed due to review comments. Now isn't the right time to reintroduce it; we can update the on-disk formats and xattrs once the in-memory graduates address specific use cases. Thanks, Anand > Thanks, > Qu >> >> 5. Tested FIO random read/write and defrag compression workloads with >> min_contiguous_read set to sector size, 192k, and 256k. >> >> RAID1 balancing method rotation is better for multi-process workloads >> such as fio and also single-process workload such as defragmentation. >> >> $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw -- >> bs=4k \ >> --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \ >> --time_based --group_reporting --name=iops-test-job --eta- >> newline=1 >> >> >> | | | | Read I/O count | >> | | Read | Write | devid1 | devid2 | >> |---------|------------|------------|--------|--------| >> | pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 | >> | rotation| | | | | >> | 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 | >> | 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 | >> | 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 | >> | latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 | >> | devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 | >> >> rotation RAID1 balancing technique performs more than 2x better for >> single-process defrag. >> >> $ time -p btrfs filesystem defrag -r -f -c /btrfs >> >> >> | | Time | Read I/O Count | >> | | Real | devid1 | devid2 | >> |---------|-------|--------|--------| >> | pid | 18.00s| 3800 | 0 | >> | rotation| | | | >> | 4096| 8.95s| 1900 | 1901 | >> | 196608| 8.50s| 1881 | 1919 | >> | 262144| 8.80s| 1881 | 1919 | >> | latency | 17.18s| 3800 | 0 | >> | devid:2 | 17.48s| 0 | 3800 | >> >> Rotation keeps all devices active, and for now, the Rotation RAID1 >> balancing method is preferable as default. More workload testing is >> needed while the code is EXPERIMENTAL. >> While Latency is better during the failing/unstable block layer >> transport. >> As of now these two techniques, are needed to be further independently >> tested with different worloads, and in the long term we should be merge >> these technique to a unified heuristic. >> >> Rotation keeps all devices active, and for now, the Rotation RAID1 >> balancing method should be the default. More workload testing is needed >> while the code is EXPERIMENTAL. >> >> Latency is smarter with unstable block layer transport. >> >> Both techniques need independent testing across workloads, with the >> goal of >> eventually merging them into a unified approach? for the long term. >> >> Devid is a hands-on approach, provides manual or user-space script >> control. >> >> These RAID1 balancing methods are tunable via the sysfs knob. >> The mount -o option and btrfs properties are under consideration. >> >> Thx. >> >> --------- original v1 ------------ >> >> The RAID1-balancing methods helps distribute read I/O across devices, and >> this patch introduces three balancing methods: rotation, latency, and >> devid. These methods are enabled under the `CONFIG_BTRFS_DEBUG` config >> option and are on top of the previously added >> `/sys/fs/btrfs/<UUID>/read_policy` interface to configure the desired >> RAID1 read balancing method. >> >> I've tested these patches using fio and filesystem defragmentation >> workloads on a two-device RAID1 setup (with both data and metadata >> mirrored across identical devices). I tracked device read counts by >> extracting stats from `/sys/devices/<..>/stat` for each device. Below is >> a summary of the results, with each result the average of three >> iterations. >> >> A typical generic random rw workload: >> >> $ fio --filename=/btrfs/foo --size=10Gi --direct=1 --rw=randrw --bs=4k \ >> --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 -- >> time_based \ >> --group_reporting --name=iops-test-job --eta-newline=1 >> >> | | | | Read I/O count | >> | | Read | Write | devid1 | devid2 | >> |---------|------------|------------|--------|--------| >> | pid | 29.4MiB/s | 29.5MiB/s | 456548 | 447975 | >> | rotation| 29.3MiB/s | 29.3MiB/s | 450105 | 450055 | >> | latency | 21.9MiB/s | 21.9MiB/s | 672387 | 0 | >> | devid:1 | 22.0MiB/s | 22.0MiB/s | 674788 | 0 | >> >> Defragmentation with compression workload: >> >> $ xfs_io -f -d -c 'pwrite -S 0xab 0 1G' /btrfs/foo >> $ sync >> $ echo 3 > /proc/sys/vm/drop_caches >> $ btrfs filesystem defrag -f -c /btrfs/foo >> >> | | Time | Read I/O Count | >> | | Real | devid1 | devid2 | >> |---------|-------|--------|--------| >> | pid | 21.61s| 3810 | 0 | >> | rotation| 11.55s| 1905 | 1905 | >> | latency | 20.99s| 0 | 3810 | >> | devid:2 | 21.41s| 0 | 3810 | >> >> . The PID-based balancing method works well for the generic random rw fio >> workload. >> . The rotation method is ideal when you want to keep both devices active, >> and it boosts performance in sequential defragmentation scenarios. >> . The latency-based method work well when we have mixed device types or >> when one device experiences intermittent I/O failures the latency >> increases and it automatically picks the other device for further Read >> IOs. >> . The devid method is a more hands-on approach, useful for diagnosing and >> testing RAID1 mirror synchronizations. >> >> Anand Jain (3): >> btrfs: introduce RAID1 round-robin read balancing >> btrfs: use the path with the lowest latency for RAID1 reads >> btrfs: add RAID1 preferred read device >> >> fs/btrfs/disk-io.c | 4 ++ >> fs/btrfs/sysfs.c | 116 +++++++++++++++++++++++++++++++++++++++------ >> fs/btrfs/volumes.c | 109 ++++++++++++++++++++++++++++++++++++++++++ >> fs/btrfs/volumes.h | 16 +++++++ >> 4 files changed, 230 insertions(+), 15 deletions(-) >> >
On Fri, Oct 11, 2024 at 10:49:15AM +0800, Anand Jain wrote: > v2: > 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG. > 2. Correct the typo from %est_wait to %best_wait. > 3. Initialize %best_wait to U64_MAX and remove the check for 0. > 4. Implement rotation with a minimum contiguous read threshold before > switching to the next stripe. Configure this, using: > > echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy > > The default value is the sector size, and the min_contiguous_read > value must be a multiple of the sector size. I think it's safe to start with the round-round robin policy, but the syntax is strange, why the [ ] are mandatory? Also please call it round-robin, or 'rr' for short. The default of sector size is IMHO a wrong value, switching devices so often will drop the performance just because of the io request overhead. From what I rememer values around 200k were reasonable, so either 192k or 256k should be the default. We may also drop the configurable value at all and provide a few hard coded sizes like rr-256k, rr-512k, rr-1m, if not only to drop parsing of user strings. > 5. Tested FIO random read/write and defrag compression workloads with > min_contiguous_read set to sector size, 192k, and 256k. > > RAID1 balancing method rotation is better for multi-process workloads > such as fio and also single-process workload such as defragmentation. > > $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \ > --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \ > --time_based --group_reporting --name=iops-test-job --eta-newline=1 > > > | | | | Read I/O count | > | | Read | Write | devid1 | devid2 | > |---------|------------|------------|--------|--------| > | pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 | > | rotation| | | | | > | 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 | > | 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 | > | 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 | > | latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 | > | devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 | > > rotation RAID1 balancing technique performs more than 2x better for > single-process defrag. > > $ time -p btrfs filesystem defrag -r -f -c /btrfs > > > | | Time | Read I/O Count | > | | Real | devid1 | devid2 | > |---------|-------|--------|--------| > | pid | 18.00s| 3800 | 0 | > | rotation| | | | > | 4096| 8.95s| 1900 | 1901 | > | 196608| 8.50s| 1881 | 1919 | > | 262144| 8.80s| 1881 | 1919 | > | latency | 17.18s| 3800 | 0 | > | devid:2 | 17.48s| 0 | 3800 | > > Rotation keeps all devices active, and for now, the Rotation RAID1 > balancing method is preferable as default. More workload testing is > needed while the code is EXPERIMENTAL. Yeah round-robin will be a good defalt, we only need to verify the chunk size and then do the switch in the next release. > While Latency is better during the failing/unstable block layer transport. > As of now these two techniques, are needed to be further independently > tested with different worloads, and in the long term we should be merge > these technique to a unified heuristic. This sounds like he latency is good for a specific case and maybe a fallback if the device becomes faulty, but once the layer below becomes unstable we may need to skip reading from the device. This is also a different mode of operation than balancing reads. > Rotation keeps all devices active, and for now, the Rotation RAID1 > balancing method should be the default. More workload testing is needed > while the code is EXPERIMENTAL. > > Latency is smarter with unstable block layer transport. > > Both techniques need independent testing across workloads, with the goal of > eventually merging them into a unified approach? for the long term. > > Devid is a hands-on approach, provides manual or user-space script control. > > These RAID1 balancing methods are tunable via the sysfs knob. > The mount -o option and btrfs properties are under consideration. To move forward with the feature I think the round robin and preferred device id can be merged. I'm not sure about the latency but if it's under experimental we can take it as is and tune later. I'll check my notes from the last time Michal attempted to implement the policies if we haven't missed something.
Anand Jain wrote: > v2: > 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG. > 2. Correct the typo from %est_wait to %best_wait. > 3. Initialize %best_wait to U64_MAX and remove the check for 0. > 4. Implement rotation with a minimum contiguous read threshold before > switching to the next stripe. Configure this, using: > > echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy > > The default value is the sector size, and the min_contiguous_read > value must be a multiple of the sector size. > > 5. Tested FIO random read/write and defrag compression workloads with > min_contiguous_read set to sector size, 192k, and 256k. > > RAID1 balancing method rotation is better for multi-process workloads > such as fio and also single-process workload such as defragmentation. With this functionality added, would it not also make sense to add a RAID0/10 profile that limits the stripe width, so a stripe does not spawn more than n disk (for example n=4). On systems with for example 24 disks in RAID10, a read may activate 12 disks at the same time which could easily saturate the bus. Therefore if a storage profile that limits the number of devices a stripe occupy existed, it seems like there might be posibillities for RAID0/10 as well. Note that as of writing this I believe that RAID0/10/5/6 make the stripe as wide as the number of storage devices available for the filesystem. If I am wrong about this please ignore my jabbering and move on.
Thanks for commenting. On 21/10/24 22:05, David Sterba wrote: > On Fri, Oct 11, 2024 at 10:49:15AM +0800, Anand Jain wrote: >> v2: >> 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG. >> 2. Correct the typo from %est_wait to %best_wait. >> 3. Initialize %best_wait to U64_MAX and remove the check for 0. >> 4. Implement rotation with a minimum contiguous read threshold before >> switching to the next stripe. Configure this, using: >> >> echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy >> >> The default value is the sector size, and the min_contiguous_read >> value must be a multiple of the sector size. > > I think it's safe to start with the round-round robin policy, but the > syntax is strange, why the [ ] are mandatory? Also please call it > round-robin, or 'rr' for short. > I'm fine with round-robin. The [ ] part is not mandatory; if the min_contiguous_read value is not specified, it will default to a predefined value. > The default of sector size is IMHO a wrong value, switching devices so > often will drop the performance just because of the io request overhead. > From what I rememer values around 200k were reasonable, so either 192k > or 256k should be the default. We may also drop the configurable value > at all and provide a few hard coded sizes like rr-256k, rr-512k, rr-1m, > if not only to drop parsing of user strings. I'm okay with a default value of 256k. For the experimental feature, we can keep it configurable, allowing the opportunity to experiment with other values as well > >> 5. Tested FIO random read/write and defrag compression workloads with >> min_contiguous_read set to sector size, 192k, and 256k. >> >> RAID1 balancing method rotation is better for multi-process workloads >> such as fio and also single-process workload such as defragmentation. >> >> $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \ >> --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \ >> --time_based --group_reporting --name=iops-test-job --eta-newline=1 >> >> >> | | | | Read I/O count | >> | | Read | Write | devid1 | devid2 | >> |---------|------------|------------|--------|--------| >> | pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 | >> | rotation| | | | | >> | 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 | >> | 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 | >> | 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 | >> | latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 | >> | devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 | >> >> rotation RAID1 balancing technique performs more than 2x better for >> single-process defrag. >> >> $ time -p btrfs filesystem defrag -r -f -c /btrfs >> >> >> | | Time | Read I/O Count | >> | | Real | devid1 | devid2 | >> |---------|-------|--------|--------| >> | pid | 18.00s| 3800 | 0 | >> | rotation| | | | >> | 4096| 8.95s| 1900 | 1901 | >> | 196608| 8.50s| 1881 | 1919 | >> | 262144| 8.80s| 1881 | 1919 | >> | latency | 17.18s| 3800 | 0 | >> | devid:2 | 17.48s| 0 | 3800 | >> >> Rotation keeps all devices active, and for now, the Rotation RAID1 >> balancing method is preferable as default. More workload testing is >> needed while the code is EXPERIMENTAL. > > Yeah round-robin will be a good defalt, we only need to verify the chunk > size and then do the switch in the next release. > Yes.. >> While Latency is better during the failing/unstable block layer transport. >> As of now these two techniques, are needed to be further independently >> tested with different worloads, and in the long term we should be merge >> these technique to a unified heuristic. > > This sounds like he latency is good for a specific case and maybe a > fallback if the device becomes faulty, but once the layer below becomes > unstable we may need to skip reading from the device. This is also a > different mode of operation than balancing reads. > If the latency on the faulty path is so high that it shouldn't pick that path at all, so it works. However, the round-robin balancing is unaware of dynamic faults on the device path. IMO, a round-robin method that is latency aware (with ~20% variance) would be better. >> Rotation keeps all devices active, and for now, the Rotation RAID1 >> balancing method should be the default. More workload testing is needed >> while the code is EXPERIMENTAL. >> >> Latency is smarter with unstable block layer transport. >> >> Both techniques need independent testing across workloads, with the goal of >> eventually merging them into a unified approach? for the long term. >> >> Devid is a hands-on approach, provides manual or user-space script control. >> >> These RAID1 balancing methods are tunable via the sysfs knob. >> The mount -o option and btrfs properties are under consideration. > > To move forward with the feature I think the round robin and preferred > device id can be merged. I'm not sure about the latency but if it's > under experimental we can take it as is and tune later. I hope the experimental feature also means we can change the name of the balancing method at any time. Once we have tested a fair combination of block device types, we'll definitely need a method that can automatically tune based on the device type, which will require adding or dropping balancing methods accordingly. > I'll check my notes from the last time Michal attempted to implement the > policies if we haven't missed something. Thanks, Anand
On 21/10/24 22:32, waxhead wrote: > Anand Jain wrote: >> v2: >> 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of >> CONFIG_BTRFS_DEBUG. >> 2. Correct the typo from %est_wait to %best_wait. >> 3. Initialize %best_wait to U64_MAX and remove the check for 0. >> 4. Implement rotation with a minimum contiguous read threshold before >> switching to the next stripe. Configure this, using: >> >> echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/ >> read_policy >> >> The default value is the sector size, and the min_contiguous_read >> value must be a multiple of the sector size. >> >> 5. Tested FIO random read/write and defrag compression workloads with >> min_contiguous_read set to sector size, 192k, and 256k. >> >> RAID1 balancing method rotation is better for multi-process workloads >> such as fio and also single-process workload such as defragmentation. > > With this functionality added, would it not also make sense to add a > RAID0/10 profile that limits the stripe width, so a stripe does not > spawn more than n disk (for example n=4). > > On systems with for example 24 disks in RAID10, a read may activate 12 > disks at the same time which could easily saturate the bus. > > Therefore if a storage profile that limits the number of devices a > stripe occupy existed, it seems like there might be posibillities for > RAID0/10 as well. > > Note that as of writing this I believe that RAID0/10/5/6 make the stripe > as wide as the number of storage devices available for the filesystem. > If I am wrong about this please ignore my jabbering and move on. That's correct. I previously attempted to come up with a fix using the device grouping method. If there's a convincing and more generic way to specify how the devices should be grouped, we could consider that. Thanks, Anand
On Mon, Oct 21, 2024 at 11:36:10PM +0800, Anand Jain wrote: > > I think it's safe to start with the round-round robin policy, but the > > syntax is strange, why the [ ] are mandatory? Also please call it > > round-robin, or 'rr' for short. > > I'm fine with round-robin. The [ ] part is not mandatory; if the > min_contiguous_read value is not specified, it will default to a > predefined value. > > > The default of sector size is IMHO a wrong value, switching devices so > > often will drop the performance just because of the io request overhead. > > > From what I rememer values around 200k were reasonable, so either 192k > > or 256k should be the default. We may also drop the configurable value > > at all and provide a few hard coded sizes like rr-256k, rr-512k, rr-1m, > > if not only to drop parsing of user strings. > > I'm okay with a default value of 256k. For the experimental feature, > we can keep it configurable, allowing the opportunity to experiment > with other values as well Yeah, for experimenting it makes sense to make it flexible, no need to patch and reboot the kernel. For final we should settle on some reasonable values. > >> 5. Tested FIO random read/write and defrag compression workloads with > >> min_contiguous_read set to sector size, 192k, and 256k. > >> > >> RAID1 balancing method rotation is better for multi-process workloads > >> such as fio and also single-process workload such as defragmentation. > >> > >> $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \ > >> --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \ > >> --time_based --group_reporting --name=iops-test-job --eta-newline=1 > >> > >> > >> | | | | Read I/O count | > >> | | Read | Write | devid1 | devid2 | > >> |---------|------------|------------|--------|--------| > >> | pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 | > >> | rotation| | | | | > >> | 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 | > >> | 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 | > >> | 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 | > >> | latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 | > >> | devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 | > >> > >> rotation RAID1 balancing technique performs more than 2x better for > >> single-process defrag. > >> > >> $ time -p btrfs filesystem defrag -r -f -c /btrfs > >> > >> > >> | | Time | Read I/O Count | > >> | | Real | devid1 | devid2 | > >> |---------|-------|--------|--------| > >> | pid | 18.00s| 3800 | 0 | > >> | rotation| | | | > >> | 4096| 8.95s| 1900 | 1901 | > >> | 196608| 8.50s| 1881 | 1919 | > >> | 262144| 8.80s| 1881 | 1919 | > >> | latency | 17.18s| 3800 | 0 | > >> | devid:2 | 17.48s| 0 | 3800 | > >> > >> Rotation keeps all devices active, and for now, the Rotation RAID1 > >> balancing method is preferable as default. More workload testing is > >> needed while the code is EXPERIMENTAL. > > > > Yeah round-robin will be a good defalt, we only need to verify the chunk > > size and then do the switch in the next release. > > > > Yes.. > > >> While Latency is better during the failing/unstable block layer transport. > >> As of now these two techniques, are needed to be further independently > >> tested with different worloads, and in the long term we should be merge > >> these technique to a unified heuristic. > > > > This sounds like he latency is good for a specific case and maybe a > > fallback if the device becomes faulty, but once the layer below becomes > > unstable we may need to skip reading from the device. This is also a > > different mode of operation than balancing reads. > > > > If the latency on the faulty path is so high that it shouldn't pick that > path at all, so it works. However, the round-robin balancing is unaware > of dynamic faults on the device path. IMO, a round-robin method that is > latency aware (with ~20% variance) would be better. We should not mix the faulty device handling mode to the read balancing, at least for now. A back off algorithm that checks number of failed io requests should precede the balancing. > >> Rotation keeps all devices active, and for now, the Rotation RAID1 > >> balancing method should be the default. More workload testing is needed > >> while the code is EXPERIMENTAL. > >> > >> Latency is smarter with unstable block layer transport. > >> > >> Both techniques need independent testing across workloads, with the goal of > >> eventually merging them into a unified approach? for the long term. > >> > >> Devid is a hands-on approach, provides manual or user-space script control. > >> > >> These RAID1 balancing methods are tunable via the sysfs knob. > >> The mount -o option and btrfs properties are under consideration. > > > > To move forward with the feature I think the round robin and preferred > > device id can be merged. I'm not sure about the latency but if it's > > under experimental we can take it as is and tune later. > > I hope the experimental feature also means we can change the name of the > balancing method at any time. Once we have tested a fair combination of > block device types, we'll definitely need a method that can > automatically tune based on the device type, which will require adding > or dropping balancing methods accordingly. Yes we can change the names. The automatic tuning would need some feedback that measures the load and tries to improve the throughput, this is where we got stuck last time. So for now let's do some starightforward policy that on average works better than the current pid policy. I hope that tha round-robin-256k can be a good default, but of course we need more data for that.
On 22/10/24 02:42, David Sterba wrote: > On Mon, Oct 21, 2024 at 11:36:10PM +0800, Anand Jain wrote: >>> I think it's safe to start with the round-round robin policy, but the >>> syntax is strange, why the [ ] are mandatory? Also please call it >>> round-robin, or 'rr' for short. >> >> I'm fine with round-robin. The [ ] part is not mandatory; if the >> min_contiguous_read value is not specified, it will default to a >> predefined value. >> >>> The default of sector size is IMHO a wrong value, switching devices so >>> often will drop the performance just because of the io request overhead. >> >>> From what I rememer values around 200k were reasonable, so either 192k >>> or 256k should be the default. We may also drop the configurable value >>> at all and provide a few hard coded sizes like rr-256k, rr-512k, rr-1m, >>> if not only to drop parsing of user strings. >> >> I'm okay with a default value of 256k. For the experimental feature, >> we can keep it configurable, allowing the opportunity to experiment >> with other values as well > > Yeah, for experimenting it makes sense to make it flexible, no need to > patch and reboot the kernel. For final we should settle on some > reasonable values. > >>>> 5. Tested FIO random read/write and defrag compression workloads with >>>> min_contiguous_read set to sector size, 192k, and 256k. >>>> >>>> RAID1 balancing method rotation is better for multi-process workloads >>>> such as fio and also single-process workload such as defragmentation. >>>> >>>> $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \ >>>> --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \ >>>> --time_based --group_reporting --name=iops-test-job --eta-newline=1 >>>> >>>> >>>> | | | | Read I/O count | >>>> | | Read | Write | devid1 | devid2 | >>>> |---------|------------|------------|--------|--------| >>>> | pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 | >>>> | rotation| | | | | >>>> | 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 | >>>> | 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 | >>>> | 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 | >>>> | latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 | >>>> | devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 | >>>> >>>> rotation RAID1 balancing technique performs more than 2x better for >>>> single-process defrag. >>>> >>>> $ time -p btrfs filesystem defrag -r -f -c /btrfs >>>> >>>> >>>> | | Time | Read I/O Count | >>>> | | Real | devid1 | devid2 | >>>> |---------|-------|--------|--------| >>>> | pid | 18.00s| 3800 | 0 | >>>> | rotation| | | | >>>> | 4096| 8.95s| 1900 | 1901 | >>>> | 196608| 8.50s| 1881 | 1919 | >>>> | 262144| 8.80s| 1881 | 1919 | >>>> | latency | 17.18s| 3800 | 0 | >>>> | devid:2 | 17.48s| 0 | 3800 | >>>> >>>> Rotation keeps all devices active, and for now, the Rotation RAID1 >>>> balancing method is preferable as default. More workload testing is >>>> needed while the code is EXPERIMENTAL. >>> >>> Yeah round-robin will be a good defalt, we only need to verify the chunk >>> size and then do the switch in the next release. >>> >> >> Yes.. >> >>>> While Latency is better during the failing/unstable block layer transport. >>>> As of now these two techniques, are needed to be further independently >>>> tested with different worloads, and in the long term we should be merge >>>> these technique to a unified heuristic. >>> >>> This sounds like he latency is good for a specific case and maybe a >>> fallback if the device becomes faulty, but once the layer below becomes >>> unstable we may need to skip reading from the device. This is also a >>> different mode of operation than balancing reads. >>> >> >> If the latency on the faulty path is so high that it shouldn't pick that >> path at all, so it works. However, the round-robin balancing is unaware >> of dynamic faults on the device path. IMO, a round-robin method that is >> latency aware (with ~20% variance) would be better. > > We should not mix the faulty device handling mode to the read balancing, > at least for now. A back off algorithm that checks number of failed io > requests should precede the balancing. > >>>> Rotation keeps all devices active, and for now, the Rotation RAID1 >>>> balancing method should be the default. More workload testing is needed >>>> while the code is EXPERIMENTAL. >>>> >>>> Latency is smarter with unstable block layer transport. >>>> >>>> Both techniques need independent testing across workloads, with the goal of >>>> eventually merging them into a unified approach? for the long term. >>>> >>>> Devid is a hands-on approach, provides manual or user-space script control. >>>> >>>> These RAID1 balancing methods are tunable via the sysfs knob. >>>> The mount -o option and btrfs properties are under consideration. >>> >>> To move forward with the feature I think the round robin and preferred >>> device id can be merged. I'm not sure about the latency but if it's >>> under experimental we can take it as is and tune later. >> >> I hope the experimental feature also means we can change the name of the >> balancing method at any time. Once we have tested a fair combination of >> block device types, we'll definitely need a method that can >> automatically tune based on the device type, which will require adding >> or dropping balancing methods accordingly. > > Yes we can change the names. The automatic tuning would need some > feedback that measures the load and tries to improve the throughput, > this is where we got stuck last time. So for now let's do some > starightforward policy that on average works better than the current pid > policy. I hope that tha round-robin-256k can be a good default, but of > course we need more data for that. Sending v3 with rotation renamed to round-robin. Code review appreciated; I'll wait a day. Thanks, Anand
On 21.10.24 16:32, waxhead wrote: > Note that as of writing this I believe that RAID0/10/5/6 make the stripe > as wide as the number of storage devices available for the filesystem. > If I am wrong about this please ignore my jabbering and move on. Nope, you're correct and this is a huge problem for bigger (in numbers of drives) arrays. But it's also on my list of things I want to change in how we handle RAID with the RAID stripe-tree. This way we can do declustered RAID and ease on rebuild times. Also we can drastically enhance write parallelism to an array by directing different write streams to different sets of stripes. Which btw atm isn't even done for RAID1, as we're picking a block-group at a time until it's full which then gets written, instead of creating new block groups on new drive sets for different write streams (i.e. different files, etc..)
在 2024/10/11 13:19, Anand Jain 写道: > v2: > 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG. > 2. Correct the typo from %est_wait to %best_wait. > 3. Initialize %best_wait to U64_MAX and remove the check for 0. > 4. Implement rotation with a minimum contiguous read threshold before > switching to the next stripe. Configure this, using: > > echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy > > The default value is the sector size, and the min_contiguous_read > value must be a multiple of the sector size. > > 5. Tested FIO random read/write and defrag compression workloads with > min_contiguous_read set to sector size, 192k, and 256k. > > RAID1 balancing method rotation is better for multi-process workloads > such as fio and also single-process workload such as defragmentation. > > $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \ > --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \ > --time_based --group_reporting --name=iops-test-job --eta-newline=1 Reviewed-by: Qu Wenruo <wqu@suse.com> Although not 100% happy with the min_contiguous_read setting, since it's an optional one and still experimental, I'm fine with series so far. Just want to express my concern about going mount option. I know sysfs is not a good way to setup a lot of features, but mount option is way too committed to me, even under experimental features. But I also understand without mount option it can be pretty hard to setup the read policy for fstests runs. So I'd prefer to have some on-disk solution (XATTR or temporary items) to save the read policy. It's less committed compared to mount option (aka, much easier to revert the change with breaking any compatibility), and can help for future features. Thanks, Qu > > > | | | | Read I/O count | > | | Read | Write | devid1 | devid2 | > |---------|------------|------------|--------|--------| > | pid | 20.3MiB/s | 20.5MiB/s | 313895 | 313895 | > | rotation| | | | | > | 4096| 20.4MiB/s | 20.5MiB/s | 313895 | 313895 | > | 196608| 20.2MiB/s | 20.2MiB/s | 310152 | 310175 | > | 262144| 20.3MiB/s | 20.4MiB/s | 312180 | 312191 | > | latency| 18.4MiB/s | 18.4MiB/s | 272980 | 291683 | > | devid:1 | 14.8MiB/s | 14.9MiB/s | 456376 | 0 | > > rotation RAID1 balancing technique performs more than 2x better for > single-process defrag. > > $ time -p btrfs filesystem defrag -r -f -c /btrfs > > > | | Time | Read I/O Count | > | | Real | devid1 | devid2 | > |---------|-------|--------|--------| > | pid | 18.00s| 3800 | 0 | > | rotation| | | | > | 4096| 8.95s| 1900 | 1901 | > | 196608| 8.50s| 1881 | 1919 | > | 262144| 8.80s| 1881 | 1919 | > | latency | 17.18s| 3800 | 0 | > | devid:2 | 17.48s| 0 | 3800 | > > Rotation keeps all devices active, and for now, the Rotation RAID1 > balancing method is preferable as default. More workload testing is > needed while the code is EXPERIMENTAL. > While Latency is better during the failing/unstable block layer transport. > As of now these two techniques, are needed to be further independently > tested with different worloads, and in the long term we should be merge > these technique to a unified heuristic. > > Rotation keeps all devices active, and for now, the Rotation RAID1 > balancing method should be the default. More workload testing is needed > while the code is EXPERIMENTAL. > > Latency is smarter with unstable block layer transport. > > Both techniques need independent testing across workloads, with the goal of > eventually merging them into a unified approach? for the long term. > > Devid is a hands-on approach, provides manual or user-space script control. > > These RAID1 balancing methods are tunable via the sysfs knob. > The mount -o option and btrfs properties are under consideration. > > Thx. > > --------- original v1 ------------ > > The RAID1-balancing methods helps distribute read I/O across devices, and > this patch introduces three balancing methods: rotation, latency, and > devid. These methods are enabled under the `CONFIG_BTRFS_DEBUG` config > option and are on top of the previously added > `/sys/fs/btrfs/<UUID>/read_policy` interface to configure the desired > RAID1 read balancing method. > > I've tested these patches using fio and filesystem defragmentation > workloads on a two-device RAID1 setup (with both data and metadata > mirrored across identical devices). I tracked device read counts by > extracting stats from `/sys/devices/<..>/stat` for each device. Below is > a summary of the results, with each result the average of three > iterations. > > A typical generic random rw workload: > > $ fio --filename=/btrfs/foo --size=10Gi --direct=1 --rw=randrw --bs=4k \ > --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based \ > --group_reporting --name=iops-test-job --eta-newline=1 > > | | | | Read I/O count | > | | Read | Write | devid1 | devid2 | > |---------|------------|------------|--------|--------| > | pid | 29.4MiB/s | 29.5MiB/s | 456548 | 447975 | > | rotation| 29.3MiB/s | 29.3MiB/s | 450105 | 450055 | > | latency | 21.9MiB/s | 21.9MiB/s | 672387 | 0 | > | devid:1 | 22.0MiB/s | 22.0MiB/s | 674788 | 0 | > > Defragmentation with compression workload: > > $ xfs_io -f -d -c 'pwrite -S 0xab 0 1G' /btrfs/foo > $ sync > $ echo 3 > /proc/sys/vm/drop_caches > $ btrfs filesystem defrag -f -c /btrfs/foo > > | | Time | Read I/O Count | > | | Real | devid1 | devid2 | > |---------|-------|--------|--------| > | pid | 21.61s| 3810 | 0 | > | rotation| 11.55s| 1905 | 1905 | > | latency | 20.99s| 0 | 3810 | > | devid:2 | 21.41s| 0 | 3810 | > > . The PID-based balancing method works well for the generic random rw fio > workload. > . The rotation method is ideal when you want to keep both devices active, > and it boosts performance in sequential defragmentation scenarios. > . The latency-based method work well when we have mixed device types or > when one device experiences intermittent I/O failures the latency > increases and it automatically picks the other device for further Read > IOs. > . The devid method is a more hands-on approach, useful for diagnosing and > testing RAID1 mirror synchronizations. > > Anand Jain (3): > btrfs: introduce RAID1 round-robin read balancing > btrfs: use the path with the lowest latency for RAID1 reads > btrfs: add RAID1 preferred read device > > fs/btrfs/disk-io.c | 4 ++ > fs/btrfs/sysfs.c | 116 +++++++++++++++++++++++++++++++++++++++------ > fs/btrfs/volumes.c | 109 ++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/volumes.h | 16 +++++++ > 4 files changed, 230 insertions(+), 15 deletions(-) >