xfs: add test for truncate/collapse range race

Message ID	1419060301-26830-1-git-send-email-gux.fnst@cn.fujitsu.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <fstests-owner@kernel.org> From: Xing Gu <gux.fnst@cn.fujitsu.com> To: <fstests@vger.kernel.org> CC: <david@fromorbit.com>, <guaneryu@gmail.com>, <lczerner@redhat.com>, Xing Gu <gux.fnst@cn.fujitsu.com> Subject: [PATCH] xfs: add test for truncate/collapse range race Date: Sat, 20 Dec 2014 15:25:01 +0800 Message-ID: <1419060301-26830-1-git-send-email-gux.fnst@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain Sender: fstests-owner@vger.kernel.org Precedence: bulk

Message ID

1419060301-26830-1-git-send-email-gux.fnst@cn.fujitsu.com (mailing list archive)

State

New, archived

Headers

From: Xing Gu <gux.fnst@cn.fujitsu.com>
To: <fstests@vger.kernel.org>
CC: <david@fromorbit.com>, <guaneryu@gmail.com>, <lczerner@redhat.com>,
	Xing Gu <gux.fnst@cn.fujitsu.com>
Subject: [PATCH] xfs: add test for truncate/collapse range race
Date: Sat, 20 Dec 2014 15:25:01 +0800
Message-ID: <1419060301-26830-1-git-send-email-gux.fnst@cn.fujitsu.com>
MIME-Version: 1.0
Content-Type: text/plain
Sender: fstests-owner@vger.kernel.org
Precedence: bulk

Commit Message

Xing Gu Dec. 20, 2014, 7:25 a.m. UTC

This case tests truncate/collapse range race. If
the race occurs, it will trigger BUG_ON.

Signed-off-by: Xing Gu <gux.fnst@cn.fujitsu.com>
---
 tests/generic/039     | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/039.out |  1 +
 tests/generic/group   |  1 +
 3 files changed, 77 insertions(+)
 create mode 100755 tests/generic/039
 create mode 100644 tests/generic/039.out

Comments

Dave Chinner Dec. 24, 2014, 1:53 a.m. UTC | #1

On Sat, Dec 20, 2014 at 03:25:01PM +0800, Xing Gu wrote:
> This case tests truncate/collapse range race. If
> the race occurs, it will trigger BUG_ON.
> 
> Signed-off-by: Xing Gu <gux.fnst@cn.fujitsu.com>
> ---

What changed from the previous version?

...
> +rm -f $seqres.full
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +
> +old_bug=`dmesg | grep -c "kernel BUG"`
> +
> +testfile=$SCRATCH_MNT/file.$seq
> +# fcollapse/truncate continuously and simultaneously a same file
> +for ((i=1; i <= 100; i++)); do
> +	for ((i=1; i <= 1000; i++)); do
> +		$XFS_IO_PROG -f -c 'truncate 100k' $testfile 2>> $seqres.full
> +		$XFS_IO_PROG -f -c 'fcollapse 0 16k' $testfile 2>> $seqres.full
> +	done &
> +	for ((i=1; i <= 1000; i++)); do
> +		$XFS_IO_PROG -f -c 'truncate 0' $testfile 2>> $seqres.full
> +	done &
> +done

The previous version of this ran a loop for 3 minutes, which we
talked about being too long. This loop forks 300,000 processes
and generates a 1.5MB $seqres.full file.  On my single CPU test VM 
it takes:

generic/039      302s

About 5 minutes to run, so it takes longer than the 3 minute version
of the same test we said was too long. FYI, my 16p test VM still
takes 35s to crunch through this test and it pegs all 16 CPUs to
100% usage.

We don't need to record the output of the xfs_io commands, so
avoiding a fork and throwing away the output such as:

	$XFS_IO_PROG -f -c 'truncate 100k' \
			-c 'fcollapse 0 16k' \
			$testfile > /dev/null 2>&1

makes the runtime on the 16p VM drop by 40% (22s) and by 33% (200s)
on the single CPU VM. but that's still too long on the smaller CPU
systems.

I think the loop iterations need to be tuned to the number of CPUs
in the system. This:

NCPUS=`$here/src/feature -o`
OUTER_LOOPS=$((10 * $NCPUS * $LOAD_FACTOR))
INNER_LOOPS=$((50 * $NCPUS * $LOAD_FACTOR))

plus the above xfs_io optimisations give a runtime of 3s on my 1p
machien and 30s on my 16p machine. That would be more acceptible
to everyone, I think.

> +wait
> +
> +new_bug=`dmesg | grep -c "kernel BUG"`
> +if [ $new_bug -ne $old_bug ]; then
> +	_fail "kernel bug detected, check dmesg for more infomation."
> +fi

A kernel bug in a process with an open file descriptor will cause
the filesystem to be unmountable. It will hang the test, require a
reboot.  Hence there's no point in checking dmesg for a bug message
as it will be noticed by the test failing to complete.

> +status=0
> +exit
> diff --git a/tests/generic/039.out b/tests/generic/039.out
> new file mode 100644
> index 0000000..0cacac7
> --- /dev/null
> +++ b/tests/generic/039.out
> @@ -0,0 +1 @@
> +QA output created by 039

The test needs to echo something to indicate that an empty golden
output file is expected. "Silence is golden" is the usual phrase
here....

>  036 auto aio rw stress
>  037 metadata auto quick
>  038 auto stress
> +039 auto metadata rw

With the addition of $LOAD_FACTOR, this can be added to the stress
group as well.

Cheers,

Dave.

Xing Gu Dec. 25, 2014, 7:35 a.m. UTC | #2

On 12/24/2014 09:53 AM, Dave Chinner wrote:
> On Sat, Dec 20, 2014 at 03:25:01PM +0800, Xing Gu wrote:
>> This case tests truncate/collapse range race. If
>> the race occurs, it will trigger BUG_ON.
>>
>> Signed-off-by: Xing Gu <gux.fnst@cn.fujitsu.com>
>> ---
>
> What changed from the previous version?
>

Compared with the previous version?there are mainly two changes:
(1) Since this patch only checks for the truncate/collapse range race,
the description of previous version is not clear. I changed the description.
(2) Considering the different performance of each test machine, it is
not reasonable to set a run loop for a fixed time eg. 3 minutes in the
previous version. I changed the form of loop.

> ...
>> +rm -f $seqres.full
>> +_scratch_mkfs >>$seqres.full 2>&1
>> +_scratch_mount
>> +
>> +old_bug=`dmesg | grep -c "kernel BUG"`
>> +
>> +testfile=$SCRATCH_MNT/file.$seq
>> +# fcollapse/truncate continuously and simultaneously a same file
>> +for ((i=1; i <= 100; i++)); do
>> +	for ((i=1; i <= 1000; i++)); do
>> +		$XFS_IO_PROG -f -c 'truncate 100k' $testfile 2>> $seqres.full
>> +		$XFS_IO_PROG -f -c 'fcollapse 0 16k' $testfile 2>> $seqres.full
>> +	done &
>> +	for ((i=1; i <= 1000; i++)); do
>> +		$XFS_IO_PROG -f -c 'truncate 0' $testfile 2>> $seqres.full
>> +	done &
>> +done
>
> The previous version of this ran a loop for 3 minutes, which we
> talked about being too long. This loop forks 300,000 processes
> and generates a 1.5MB $seqres.full file.  On my single CPU test VM
> it takes:
>
> generic/039      302s
>
> About 5 minutes to run, so it takes longer than the 3 minute version
> of the same test we said was too long. FYI, my 16p test VM still
> takes 35s to crunch through this test and it pegs all 16 CPUs to
> 100% usage.
>
> We don't need to record the output of the xfs_io commands, so
> avoiding a fork and throwing away the output such as:
>
> 	$XFS_IO_PROG -f -c 'truncate 100k' \
> 			-c 'fcollapse 0 16k' \
> 			$testfile > /dev/null 2>&1
>
> makes the runtime on the 16p VM drop by 40% (22s) and by 33% (200s)
> on the single CPU VM. but that's still too long on the smaller CPU
> systems.
>
> I think the loop iterations need to be tuned to the number of CPUs
> in the system. This:
>
> NCPUS=`$here/src/feature -o`
> OUTER_LOOPS=$((10 * $NCPUS * $LOAD_FACTOR))
> INNER_LOOPS=$((50 * $NCPUS * $LOAD_FACTOR))
>
> plus the above xfs_io optimisations give a runtime of 3s on my 1p
> machien and 30s on my 16p machine. That would be more acceptible
> to everyone, I think.
>

Got it.

>> +wait
>> +
>> +new_bug=`dmesg | grep -c "kernel BUG"`
>> +if [ $new_bug -ne $old_bug ]; then
>> +	_fail "kernel bug detected, check dmesg for more infomation."
>> +fi
>
> A kernel bug in a process with an open file descriptor will cause
> the filesystem to be unmountable. It will hang the test, require a
> reboot.  Hence there's no point in checking dmesg for a bug message
> as it will be noticed by the test failing to complete.
>

Got it.

>> +status=0
>> +exit
>> diff --git a/tests/generic/039.out b/tests/generic/039.out
>> new file mode 100644
>> index 0000000..0cacac7
>> --- /dev/null
>> +++ b/tests/generic/039.out
>> @@ -0,0 +1 @@
>> +QA output created by 039
>
> The test needs to echo something to indicate that an empty golden
> output file is expected. "Silence is golden" is the usual phrase
> here....
>

Got it.

>>   036 auto aio rw stress
>>   037 metadata auto quick
>>   038 auto stress
>> +039 auto metadata rw
>
> With the addition of $LOAD_FACTOR, this can be added to the stress
> group as well.
>


Got it.
Thanks for your suggestion!

Regards,
Xing Gu

> Cheers,
>
> Dave.
>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/tests/generic/039 b/tests/generic/039
new file mode 100755
index 0000000..a09df43
--- /dev/null
+++ b/tests/generic/039
@@ -0,0 +1,75 @@ 
+#! /bin/bash
+# FS QA Test No. 039
+#
+# Test truncate/collapse range race.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2014 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+    rm -f $tmp.*
+}
+
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs generic
+_require_scratch
+_require_xfs_io_command "fcollapse"
+
+rm -f $seqres.full
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+old_bug=`dmesg | grep -c "kernel BUG"`
+
+testfile=$SCRATCH_MNT/file.$seq
+# fcollapse/truncate continuously and simultaneously a same file
+for ((i=1; i <= 100; i++)); do
+	for ((i=1; i <= 1000; i++)); do
+		$XFS_IO_PROG -f -c 'truncate 100k' $testfile 2>> $seqres.full
+		$XFS_IO_PROG -f -c 'fcollapse 0 16k' $testfile 2>> $seqres.full
+	done &
+	for ((i=1; i <= 1000; i++)); do
+		$XFS_IO_PROG -f -c 'truncate 0' $testfile 2>> $seqres.full
+	done &
+done
+
+wait
+
+new_bug=`dmesg | grep -c "kernel BUG"`
+if [ $new_bug -ne $old_bug ]; then
+	_fail "kernel bug detected, check dmesg for more infomation."
+fi
+
+status=0
+exit
diff --git a/tests/generic/039.out b/tests/generic/039.out
new file mode 100644
index 0000000..0cacac7
--- /dev/null
+++ b/tests/generic/039.out
@@ -0,0 +1 @@ 
+QA output created by 039
diff --git a/tests/generic/group b/tests/generic/group
index 1e89848..5a3d13a 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -41,6 +41,7 @@ 
 036 auto aio rw stress
 037 metadata auto quick
 038 auto stress
+039 auto metadata rw
 053 acl repair auto quick
 062 attr udf auto quick
 068 other auto freeze dangerous stress

xfs: add test for truncate/collapse range race

Commit Message

Comments

Patch