diff mbox

fstests: btrfs: enhance regression test for nocsum dio read's repair

Message ID 20170814070313.20092-1-lufq.fnst@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Lu Fengqi Aug. 14, 2017, 7:03 a.m. UTC
I catch this following error from dmesg when this testcase fails.

[17446.661127] Buffer I/O error on dev sdb1, logical block 64, async page read

We expect to inject disk IO errors on the device when xfs_io reads the
specific file, but other processes may trigger IO error earlier. So, we
can use task-filter to solve this problem.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
---
 tests/btrfs/142 | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

Comments

Eryu Guan Aug. 15, 2017, 9:16 a.m. UTC | #1
On Mon, Aug 14, 2017 at 03:03:13PM +0800, Lu Fengqi wrote:
> I catch this following error from dmesg when this testcase fails.
> 
> [17446.661127] Buffer I/O error on dev sdb1, logical block 64, async page read
> 
> We expect to inject disk IO errors on the device when xfs_io reads the
> specific file, but other processes may trigger IO error earlier. So, we
> can use task-filter to solve this problem.
> 
> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>

This looks OK to me. Does btrfs/143 need a similar fix?

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lu Fengqi Aug. 15, 2017, 9:54 a.m. UTC | #2
On Tue, Aug 15, 2017 at 05:16:06PM +0800, Eryu Guan wrote:
>On Mon, Aug 14, 2017 at 03:03:13PM +0800, Lu Fengqi wrote:
>> I catch this following error from dmesg when this testcase fails.
>> 
>> [17446.661127] Buffer I/O error on dev sdb1, logical block 64, async page read
>> 
>> We expect to inject disk IO errors on the device when xfs_io reads the
>> specific file, but other processes may trigger IO error earlier. So, we
>> can use task-filter to solve this problem.
>> 
>> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
>
>This looks OK to me. Does btrfs/143 need a similar fix?
>
>Thanks,
>Eryu
>
>

Although btrfs/143 has a similar problem, this method doesn't work for it.
Unlike btrfs/142, the second IO error needs to be triggered by another process,
not by xfs_io. So we can't simply set task-filter to solve this problem.

I'm still investigating a way to solve the problem.

Do you have any ideas?
Any help will be greatly appreciated.
diff mbox

Patch

diff --git a/tests/btrfs/142 b/tests/btrfs/142
index 414af1b2..5bd8b728 100755
--- a/tests/btrfs/142
+++ b/tests/btrfs/142
@@ -75,6 +75,7 @@  start_fail()
 {
 	echo 100 > $DEBUGFS_MNT/fail_make_request/probability
 	echo 2 > $DEBUGFS_MNT/fail_make_request/times
+	echo 1 > $DEBUGFS_MNT/fail_make_request/task-filter
 	echo 0 > $DEBUGFS_MNT/fail_make_request/verbose
 	echo 1 > $SYSFS_BDEV/make-it-fail
 }
@@ -83,6 +84,7 @@  stop_fail()
 {
 	echo 0 > $DEBUGFS_MNT/fail_make_request/probability
 	echo 0 > $DEBUGFS_MNT/fail_make_request/times
+	echo 0 > $DEBUGFS_MNT/fail_make_request/task-filter
 	echo 0 > $SYSFS_BDEV/make-it-fail
 }
 
@@ -118,16 +120,17 @@  echo "step 3......repair the bad copy" >>$seqres.full
 # since raid1 consists of two copies, and the bad copy was put on stripe #1
 # while the good copy lies on stripe #0, the bad copy only gets access when the
 # reader's pid % 2 == 1 is true
-while true; do
-	# start_fail only fails the following dio read so the repair is
+start_fail
+while [[ -z ${result} ]]; do
+	# enable task-filter only fails the following dio read so the repair is
 	# supposed to work.
-	start_fail
-	$XFS_IO_PROG -d -c "pread -b 128K 0 128K" "$SCRATCH_MNT/foobar" > /dev/null &
-	pid=$!
-	wait
-	stop_fail
-	[ $((pid % 2)) == 1 ] && break
+	result=$(bash -c "
+	if [[ \$((\$\$ % 2)) -eq 1 ]]; then
+		echo 1 > /proc/\$\$/make-it-fail
+		exec $XFS_IO_PROG -d -c \"pread -b 128K 0 128K\" \"$SCRATCH_MNT/foobar\"
+	fi");
 done
+stop_fail
 
 _scratch_unmount