Message ID | 173e7faa9202a5d3438cd5bbdca765708f3bc729.1691477705.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs-progs: tests/misc/058: reduce the space requirement and speed up the test | expand |
On Tue, Aug 08, 2023 at 02:55:21PM +0800, Qu Wenruo wrote: > [BUG] > When I was testing misc/058, the fs still has around 7GiB free space, > but during that test case, btrfs kernel module reports write failures > and even git commands failed inside that fs. > > And obviously the test case failed. > > [CAUSE] > It turns out that, the test case itself would require 6GiB (4 data > disks) + 1.5GiB x 2 (the two replace target), thus it requires 9 GiB > free space. > > And obviously my partition is not that large and failed. The file sizes were picked so the replace is not too fast, this again depends on the system. Please add more space for tests. > [FIX] > In fact, we really don't need that much space at all. > > Our objective is to test "btrfs device replace --enqueue" functionality, > there is not much need to wait for 1 second, we can just do the enqueue > immediately. This depends on the system and the sleep might be needed if the first command does not start the first replace. The test is not testing just the --enqueue, but that two replaces can be enqueued on top each other. So we need the first one to start. > So this patch would reduce the file size to a more sane (and rounded) > 2GiB, and do the enqueue immediately. I'm not sure that the test would actually work as intended after the changes. The sleeps and dependency on system is fragile but we don't have anything better than to over allocate and provide enough time for the other commands to catch up.
On Wed, Aug 09, 2023 at 02:26:49PM +0200, David Sterba wrote: > On Tue, Aug 08, 2023 at 02:55:21PM +0800, Qu Wenruo wrote: > > [BUG] > > When I was testing misc/058, the fs still has around 7GiB free space, > > but during that test case, btrfs kernel module reports write failures > > and even git commands failed inside that fs. > > > > And obviously the test case failed. > > > > [CAUSE] > > It turns out that, the test case itself would require 6GiB (4 data > > disks) + 1.5GiB x 2 (the two replace target), thus it requires 9 GiB > > free space. > > > > And obviously my partition is not that large and failed. > > The file sizes were picked so the replace is not too fast, this again > depends on the system. Please add more space for tests. > > > [FIX] > > In fact, we really don't need that much space at all. > > > > Our objective is to test "btrfs device replace --enqueue" functionality, > > there is not much need to wait for 1 second, we can just do the enqueue > > immediately. > > This depends on the system and the sleep might be needed if the first > command does not start the first replace. The test is not testing just > the --enqueue, but that two replaces can be enqueued on top each other. > So we need the first one to start. > > > So this patch would reduce the file size to a more sane (and rounded) > > 2GiB, and do the enqueue immediately. > > I'm not sure that the test would actually work as intended after the > changes. The sleeps and dependency on system is fragile but we don't > have anything better than to over allocate and provide enough time for > the other commands to catch up. The reduced test still reliably verifies the fix so I'll apply it. Thanks.
On Tue, Aug 08, 2023 at 02:55:21PM +0800, Qu Wenruo wrote: > [BUG] > When I was testing misc/058, the fs still has around 7GiB free space, > but during that test case, btrfs kernel module reports write failures > and even git commands failed inside that fs. > > And obviously the test case failed. > > [CAUSE] > It turns out that, the test case itself would require 6GiB (4 data > disks) + 1.5GiB x 2 (the two replace target), thus it requires 9 GiB > free space. > > And obviously my partition is not that large and failed. > > [FIX] > In fact, we really don't need that much space at all. > > Our objective is to test "btrfs device replace --enqueue" functionality, > there is not much need to wait for 1 second, we can just do the enqueue > immediately. > > So this patch would reduce the file size to a more sane (and rounded) > 2GiB, and do the enqueue immediately. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On 2023/8/9 20:35, David Sterba wrote: > On Wed, Aug 09, 2023 at 02:26:49PM +0200, David Sterba wrote: >> On Tue, Aug 08, 2023 at 02:55:21PM +0800, Qu Wenruo wrote: >>> [BUG] >>> When I was testing misc/058, the fs still has around 7GiB free space, >>> but during that test case, btrfs kernel module reports write failures >>> and even git commands failed inside that fs. >>> >>> And obviously the test case failed. >>> >>> [CAUSE] >>> It turns out that, the test case itself would require 6GiB (4 data >>> disks) + 1.5GiB x 2 (the two replace target), thus it requires 9 GiB >>> free space. >>> >>> And obviously my partition is not that large and failed. >> >> The file sizes were picked so the replace is not too fast, this again >> depends on the system. Please add more space for tests. >> >>> [FIX] >>> In fact, we really don't need that much space at all. >>> >>> Our objective is to test "btrfs device replace --enqueue" functionality, >>> there is not much need to wait for 1 second, we can just do the enqueue >>> immediately. >> >> This depends on the system and the sleep might be needed if the first >> command does not start the first replace. The test is not testing just >> the --enqueue, but that two replaces can be enqueued on top each other. >> So we need the first one to start. >> >>> So this patch would reduce the file size to a more sane (and rounded) >>> 2GiB, and do the enqueue immediately. >> >> I'm not sure that the test would actually work as intended after the >> changes. The sleeps and dependency on system is fragile but we don't >> have anything better than to over allocate and provide enough time for >> the other commands to catch up. > > The reduced test still reliably verifies the fix so I'll apply it. > Thanks. Despite the merge, I still want to discuss the principle behind the test cases. Unlike fstests, we don't really have strong requirement on the disk sizes, thus most tests only go a 2GiB sparse file. This leads to very loose disk size requirement, just like this case, we can easily go 6GiB (more accurate 9GiB) without any warning or checks in advance. I believe the proper way to go in the future would be either: - Add a proper size requirement check Just like xfstests - Put more explicit recommends on the file sizes We can recommend something like doing IO for 4sec, and only sleep for 1sec. But unfortunate this is not future proof, as modern PCIE5 drives can already go beyond 10GiB/s sequential writes. Although for this particular case, I'm wondering if it's possible to do multiple enqueue calls? E.g: btrfs replace start --enqueue 2 $replace_dev1 btrfs replace start --enqueue 2 $replace_dev2 btrfs replace start --enqueue 2 $replace_dev1 If that's possible, I'd say it's better than any of the existing method. Thanks, Qu
On Thu, Aug 10, 2023 at 09:06:32AM +0800, Qu Wenruo wrote: > On 2023/8/9 20:35, David Sterba wrote: > > On Wed, Aug 09, 2023 at 02:26:49PM +0200, David Sterba wrote: > >> On Tue, Aug 08, 2023 at 02:55:21PM +0800, Qu Wenruo wrote: > >>> [BUG] > >>> When I was testing misc/058, the fs still has around 7GiB free space, > >>> but during that test case, btrfs kernel module reports write failures > >>> and even git commands failed inside that fs. > >>> > >>> And obviously the test case failed. > >>> > >>> [CAUSE] > >>> It turns out that, the test case itself would require 6GiB (4 data > >>> disks) + 1.5GiB x 2 (the two replace target), thus it requires 9 GiB > >>> free space. > >>> > >>> And obviously my partition is not that large and failed. > >> > >> The file sizes were picked so the replace is not too fast, this again > >> depends on the system. Please add more space for tests. > >> > >>> [FIX] > >>> In fact, we really don't need that much space at all. > >>> > >>> Our objective is to test "btrfs device replace --enqueue" functionality, > >>> there is not much need to wait for 1 second, we can just do the enqueue > >>> immediately. > >> > >> This depends on the system and the sleep might be needed if the first > >> command does not start the first replace. The test is not testing just > >> the --enqueue, but that two replaces can be enqueued on top each other. > >> So we need the first one to start. > >> > >>> So this patch would reduce the file size to a more sane (and rounded) > >>> 2GiB, and do the enqueue immediately. > >> > >> I'm not sure that the test would actually work as intended after the > >> changes. The sleeps and dependency on system is fragile but we don't > >> have anything better than to over allocate and provide enough time for > >> the other commands to catch up. > > > > The reduced test still reliably verifies the fix so I'll apply it. > > Thanks. > > Despite the merge, I still want to discuss the principle behind the test > cases. > > Unlike fstests, we don't really have strong requirement on the disk > sizes, thus most tests only go a 2GiB sparse file. > > This leads to very loose disk size requirement, just like this case, we > can easily go 6GiB (more accurate 9GiB) without any warning or checks in > advance. > > I believe the proper way to go in the future would be either: > > - Add a proper size requirement check > Just like xfstests For tests that require some minimum size eg. to create files the minimum size would make sense and we can make them explicit in the tests but for vast majority I think we can let it 2G. Tests that work on multiple devices could make sure all the device sizes will fit on the partition in advance. This can be done by the wrappers so we would not need to change the tests. > - Put more explicit recommends on the file sizes > We can recommend something like doing IO for 4sec, and only sleep for > 1sec. For tests that depend on timing, yes. > But unfortunate this is not future proof, as modern PCIE5 drives can > already go beyond 10GiB/s sequential writes. Yeah, we'd need the tests scale to the the device throughput but either it would have to be measured before the test (and depens on the system load) or we'd have to guess the speed class by reading the device info, which might not be obvious due to layering. We can assume at least HDD based devices and SSD and scale the tests for them. NVMe devices are too fast for some tests so we won't have the coverage on them. > Although for this particular case, I'm wondering if it's possible to do > multiple enqueue calls? E.g: > > btrfs replace start --enqueue 2 $replace_dev1 > btrfs replace start --enqueue 2 $replace_dev2 > btrfs replace start --enqueue 2 $replace_dev1 > > If that's possible, I'd say it's better than any of the existing method. Right this seems to be more reliable. I designed the test by following the steps from the report, but the bug would be reproducibe by the above sequence too.
diff --git a/tests/misc-tests/058-replace-start-enqueue/test.sh b/tests/misc-tests/058-replace-start-enqueue/test.sh index 1a24d5ec7ecb..bdbc87b4090d 100755 --- a/tests/misc-tests/058-replace-start-enqueue/test.sh +++ b/tests/misc-tests/058-replace-start-enqueue/test.sh @@ -21,16 +21,15 @@ run_check_mount_test_dev run_check $SUDO_HELPER "$TOP/btrfs" device remove "$REPLACE1" "$TEST_MNT" run_check $SUDO_HELPER "$TOP/btrfs" device remove "$REPLACE2" "$TEST_MNT" -for i in `seq 48`; do +for i in `seq 16`; do run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/file$i" bs=1M count=128 status=noxfer done # Sync so replace start does not block in unwritten IO run_check "$TOP/btrfs" filesystem sync "$TEST_MNT" run_check "$TOP/btrfs" filesystem usage -T "$TEST_MNT" -# Go background, should not be that fast, estimated 10 seconds +# Go background, should not be that fast. run_check $SUDO_HELPER "$TOP/btrfs" replace start 2 "$REPLACE1" "$TEST_MNT" -run_check sleep 1 # No background, should wait run_check $SUDO_HELPER "$TOP/btrfs" replace start --enqueue 3 "$REPLACE2" "$TEST_MNT" run_check $SUDO_HELPER "$TOP/btrfs" replace status "$TEST_MNT"
[BUG] When I was testing misc/058, the fs still has around 7GiB free space, but during that test case, btrfs kernel module reports write failures and even git commands failed inside that fs. And obviously the test case failed. [CAUSE] It turns out that, the test case itself would require 6GiB (4 data disks) + 1.5GiB x 2 (the two replace target), thus it requires 9 GiB free space. And obviously my partition is not that large and failed. [FIX] In fact, we really don't need that much space at all. Our objective is to test "btrfs device replace --enqueue" functionality, there is not much need to wait for 1 second, we can just do the enqueue immediately. So this patch would reduce the file size to a more sane (and rounded) 2GiB, and do the enqueue immediately. Signed-off-by: Qu Wenruo <wqu@suse.com> --- tests/misc-tests/058-replace-start-enqueue/test.sh | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)