diff mbox series

[4/4] xfs/{422,517}: fix false positive failure

Message ID 20220619134657.1846292-5-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series aborted fstests may leave frozen fs behind | expand

Commit Message

Amir Goldstein June 19, 2022, 1:46 p.m. UTC
xfs/517 fails randomally with this error:
 QA output created by 517
 Format and populate
 Concurrent fsmap and freeze
+Terminated
 Test done

These two test run fsstress inside the sub-shell stress_loop(),
which is run as a background job.

The sub-shell has an inner loop that exist after at least 30s.
The outer shell, sleeps for 32s and then kills fsstress using killall.
If the inner sub-shell loop does not exit before fsstress is killed,
bash sub-shell prints "Terminated" to stderr and breaks golden output.

There are two easy solutions to this issue:
1. The sub-shell stderr could be redirected to /dev/null or $seq.full
2. killall can use SIGINT which suppresses the "Terminated" print

The tests generic/270, generic/388, generic/475, generic/648 use
the first method, but that looses any other errors that fsstress may
report during the inner loop.

overlay/058 uses the second method (but with SIGPIPE). Use this method
to preserve other reported errors.

Alas, this is not enough to fix the false positive failure - the main
test thread needs to also wait for the background jobs to exit.
Otherwise, killall -9 in _cleanup() will cause a similar "Killed"
message in stderr.

Adding -w to killall requires to move it to after unfreeze, otherwise,
fsstress process may be left blocked on a frozen fs and wait will not
return.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 tests/xfs/422 | 5 ++++-
 tests/xfs/517 | 5 ++++-
 2 files changed, 8 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/tests/xfs/422 b/tests/xfs/422
index 8e9a3576..63d133f8 100755
--- a/tests/xfs/422
+++ b/tests/xfs/422
@@ -95,8 +95,11 @@  while [ "$(date +%s)" -lt $((end + 2)) ]; do
 done
 
 # ...and clean up after the loops in case they didn't do it themselves.
-$KILLALL_PROG -TERM xfs_io fsstress >> $seqres.full 2>&1
+# First thaw fs, so fsstress can exit, then kill and wait for fsstress.
+# Use of SIGINT instead of SIGTERM suppresses the "Terminated" print
+# from the XXX_loop() bash sub-shells
 $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1
+$KILLALL_PROG -w -SIGINT $XFS_IO_PROG $FSSTRESS_PROG >> $seqres.full 2>&1
 
 echo "Loop finished at $(date)" >> $seqres.full
 echo "Test done"
diff --git a/tests/xfs/517 b/tests/xfs/517
index 18404248..2f52d634 100755
--- a/tests/xfs/517
+++ b/tests/xfs/517
@@ -92,8 +92,11 @@  while [ "$(date +%s)" -lt $((end + 2)) ]; do
 done
 
 # ...and clean up after the loops in case they didn't do it themselves.
-$KILLALL_PROG -TERM xfs_io fsstress >> $seqres.full 2>&1
+# First thaw fs, so fsstress can exit, then kill and wait for fsstress.
+# Use of SIGINT instead of SIGTERM suppresses the "Terminated" print
+# from the XXX_loop() bash sub-shells
 $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1
+$KILLALL_PROG -w -SIGINT $XFS_IO_PROG $FSSTRESS_PROG >> $seqres.full 2>&1
 
 echo "Loop finished at $(date)" >> $seqres.full
 echo "Test done"