diff mbox

fstests: btrfs: Check false ENOSPC bug caused by incorrect metadata reserve

Message ID

20161108091515.20867-1-quwenruo@cn.fujitsu.com (mailing list archive)

State

Not Applicable

Headers

show

From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: [PATCH] fstests: btrfs: Check false ENOSPC bug caused by incorrect
	metadata reserve
Date: Tue,  8 Nov 2016 17:15:15 +0800
Message-Id: <20161108091515.20867-1-quwenruo@cn.fujitsu.com>
MIME-Version: 1.0
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Qu Wenruo Nov. 8, 2016, 9:15 a.m. UTC

Due to the fact that btrfs uses different max extent size for
compressed and non-compressed write, it calculates metadata reservation
incorrectly.

This can leads to false ENOSPC alert for compressed write.

This test case will check it by using small fs and large nodesize, and
do parallel compressed write to trigger it.

The fix is not merged and may change in the future, but the function is
good enough:
btrfs: improve inode's outstanding_extents computation
btrfs: fix false enospc for compression

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 tests/btrfs/132     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/132.out |  2 ++
 tests/btrfs/group   |  1 +
 3 files changed, 92 insertions(+)
 create mode 100755 tests/btrfs/132
 create mode 100644 tests/btrfs/132.out

Comments

Eryu Guan Nov. 8, 2016, 10:58 a.m. UTC | #1

On Tue, Nov 08, 2016 at 05:15:15PM +0800, Qu Wenruo wrote:
> Due to the fact that btrfs uses different max extent size for
> compressed and non-compressed write, it calculates metadata reservation
> incorrectly.
> 
> This can leads to false ENOSPC alert for compressed write.
> 
> This test case will check it by using small fs and large nodesize, and
> do parallel compressed write to trigger it.
> 
> The fix is not merged and may change in the future, but the function is
> good enough:
> btrfs: improve inode's outstanding_extents computation
> btrfs: fix false enospc for compression
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  tests/btrfs/132     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/132.out |  2 ++
>  tests/btrfs/group   |  1 +
>  3 files changed, 92 insertions(+)
>  create mode 100755 tests/btrfs/132
>  create mode 100644 tests/btrfs/132.out
> 
> diff --git a/tests/btrfs/132 b/tests/btrfs/132
> new file mode 100755
> index 0000000..95c21ea
> --- /dev/null
> +++ b/tests/btrfs/132
> @@ -0,0 +1,89 @@
> +#! /bin/bash
> +# FS QA Test 132
> +#
> +# Check if false ENOSPC will happen when parallel buffer write happens
> +# The problem is caused by incorrect metadata reservation.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +
> +# Use small filesystem with maximum nodesize.
> +# Since the false ENOSPC happens due to incorrect metadata reservation,
> +# larger nodesize and small fs will make it much easier to reproduce
> +_scratch_mkfs -b 512M -n 64k >> $seqres.full 2>&1

How about override MKFS_OPTIONS and call _scratch_mkfs_sized? e.g.

export MKFS_OPTIONS="-n 64k"
_scratch_mkfs_sized $((512 * 1024 * 1024))

> +_scratch_mount "-o compress"
> +
> +sleep_time=$(($TIME_FACTOR * 15))
> +loop_writer()
> +{
> +	offset=$1
> +	len=$2
> +	file=$3
> +
> +	while true;
> +	do
> +		# Only need stderr, and we need to specify small block size
> +		# but still trigger compression
> +		xfs_io -c "pwrite -b 8K $offset $len" $file 2>&1 > /dev/null

Use $XFS_IO_PROG here.

> +	done
> +}
> +
> +touch $SCRATCH_MNT/testfile
> +
> +# Start 2 writter writting into the same file
> +# The file is 142M, which is smaller than 1/2 of the filesystem,
> +# so no other cause can lead to ENOSPC.
> +loop_writer 0 128M $SCRATCH_MNT/testfile &
> +loop_writer 128M 16M $SCRATCH_MNT/testfile &
> +
> +sleep $sleep_time
> +
> +kill 0
> +sleep

This is not working for me, even I removed the suspicious "sleep", seems
"132" itself was killed too.

I'd save pids of background jobs and kill the pids and call wait to make
sure all child processes exit running. e.g.

loop_writer ... &
pids=$!
loop_writer ... &
pids="$pids $!"

sleep $sleep_time
kill $pids
wait

And you're missing 'echo "Silence is golden"' in the test.

Thanks,
Eryu

> +
> +status=0
> +exit
> diff --git a/tests/btrfs/132.out b/tests/btrfs/132.out
> new file mode 100644
> index 0000000..b096312
> --- /dev/null
> +++ b/tests/btrfs/132.out
> @@ -0,0 +1,2 @@
> +QA output created by 132
> +Silence is golden
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index c090604..ec8ad80 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -134,3 +134,4 @@
>  129 auto quick send
>  130 auto clone send
>  131 auto quick
> +132 auto compress
> -- 
> 2.7.4
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo Nov. 9, 2016, 12:24 a.m. UTC | #2

At 11/08/2016 06:58 PM, Eryu Guan wrote:
> On Tue, Nov 08, 2016 at 05:15:15PM +0800, Qu Wenruo wrote:
>> Due to the fact that btrfs uses different max extent size for
>> compressed and non-compressed write, it calculates metadata reservation
>> incorrectly.
>>
>> This can leads to false ENOSPC alert for compressed write.
>>
>> This test case will check it by using small fs and large nodesize, and
>> do parallel compressed write to trigger it.
>>
>> The fix is not merged and may change in the future, but the function is
>> good enough:
>> btrfs: improve inode's outstanding_extents computation
>> btrfs: fix false enospc for compression
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>  tests/btrfs/132     | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  tests/btrfs/132.out |  2 ++
>>  tests/btrfs/group   |  1 +
>>  3 files changed, 92 insertions(+)
>>  create mode 100755 tests/btrfs/132
>>  create mode 100644 tests/btrfs/132.out
>>
>> diff --git a/tests/btrfs/132 b/tests/btrfs/132
>> new file mode 100755
>> index 0000000..95c21ea
>> --- /dev/null
>> +++ b/tests/btrfs/132
>> @@ -0,0 +1,89 @@
>> +#! /bin/bash
>> +# FS QA Test 132
>> +#
>> +# Check if false ENOSPC will happen when parallel buffer write happens
>> +# The problem is caused by incorrect metadata reservation.
>> +#
>> +#-----------------------------------------------------------------------
>> +# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#-----------------------------------------------------------------------
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1	# failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +	cd /
>> +	rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +
>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +
>> +# Modify as appropriate.
>> +_supported_fs btrfs
>> +_supported_os Linux
>> +_require_scratch
>> +
>> +# Use small filesystem with maximum nodesize.
>> +# Since the false ENOSPC happens due to incorrect metadata reservation,
>> +# larger nodesize and small fs will make it much easier to reproduce
>> +_scratch_mkfs -b 512M -n 64k >> $seqres.full 2>&1
>
> How about override MKFS_OPTIONS and call _scratch_mkfs_sized? e.g.
>
> export MKFS_OPTIONS="-n 64k"
> _scratch_mkfs_sized $((512 * 1024 * 1024))

Overriding MKFS_OPTIONS makes us unable to parse more mkfs options, for 
example to test some incompatible features.

So I'd just parse this mkfs options as extra options.
>
>> +_scratch_mount "-o compress"
>> +
>> +sleep_time=$(($TIME_FACTOR * 15))
>> +loop_writer()
>> +{
>> +	offset=$1
>> +	len=$2
>> +	file=$3
>> +
>> +	while true;
>> +	do
>> +		# Only need stderr, and we need to specify small block size
>> +		# but still trigger compression
>> +		xfs_io -c "pwrite -b 8K $offset $len" $file 2>&1 > /dev/null
>
> Use $XFS_IO_PROG here.

Oh, I forgot it again...
>
>> +	done
>> +}
>> +
>> +touch $SCRATCH_MNT/testfile
>> +
>> +# Start 2 writter writting into the same file
>> +# The file is 142M, which is smaller than 1/2 of the filesystem,
>> +# so no other cause can lead to ENOSPC.
>> +loop_writer 0 128M $SCRATCH_MNT/testfile &
>> +loop_writer 128M 16M $SCRATCH_MNT/testfile &
>> +
>> +sleep $sleep_time
>> +
>> +kill 0
>> +sleep

Oh, I thought I typed "wait" not "sleep".

>
> This is not working for me, even I removed the suspicious "sleep", seems
> "132" itself was killed too.
>
> I'd save pids of background jobs and kill the pids and call wait to make
> sure all child processes exit running. e.g.
>
> loop_writer ... &
> pids=$!
> loop_writer ... &
> pids="$pids $!"
>
> sleep $sleep_time
> kill $pids
> wait
>
> And you're missing 'echo "Silence is golden"' in the test.

I'll update them soon.

Thanks,
Qu
>
> Thanks,
> Eryu
>
>> +
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/132.out b/tests/btrfs/132.out
>> new file mode 100644
>> index 0000000..b096312
>> --- /dev/null
>> +++ b/tests/btrfs/132.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 132
>> +Silence is golden
>> diff --git a/tests/btrfs/group b/tests/btrfs/group
>> index c090604..ec8ad80 100644
>> --- a/tests/btrfs/group
>> +++ b/tests/btrfs/group
>> @@ -134,3 +134,4 @@
>>  129 auto quick send
>>  130 auto clone send
>>  131 auto quick
>> +132 auto compress
>> --
>> 2.7.4
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eryu Guan Nov. 9, 2016, 9:43 a.m. UTC | #3

On Wed, Nov 09, 2016 at 08:24:38AM +0800, Qu Wenruo wrote:
> 
> 
> At 11/08/2016 06:58 PM, Eryu Guan wrote:
> > On Tue, Nov 08, 2016 at 05:15:15PM +0800, Qu Wenruo wrote:
> > > Due to the fact that btrfs uses different max extent size for
> > > compressed and non-compressed write, it calculates metadata reservation
> > > incorrectly.
> > > 
> > > This can leads to false ENOSPC alert for compressed write.
> > > 
> > > This test case will check it by using small fs and large nodesize, and
> > > do parallel compressed write to trigger it.
> > > 
> > > The fix is not merged and may change in the future, but the function is
> > > good enough:
> > > btrfs: improve inode's outstanding_extents computation
> > > btrfs: fix false enospc for compression
> > > 
> > > Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> > > ---
[snip]
> > > +
> > > +# Use small filesystem with maximum nodesize.
> > > +# Since the false ENOSPC happens due to incorrect metadata reservation,
> > > +# larger nodesize and small fs will make it much easier to reproduce
> > > +_scratch_mkfs -b 512M -n 64k >> $seqres.full 2>&1
> > 
> > How about override MKFS_OPTIONS and call _scratch_mkfs_sized? e.g.
> > 
> > export MKFS_OPTIONS="-n 64k"
> > _scratch_mkfs_sized $((512 * 1024 * 1024))
> 
> Overriding MKFS_OPTIONS makes us unable to parse more mkfs options, for
> example to test some incompatible features.
> 
> So I'd just parse this mkfs options as extra options.

We should use helpers and not open-code when helpers are available. So
we should really use _scratch_mkfs_sized here.

And "-n 64k" is only to make bug easier to reproduce, but I don't think
it's necessary. In my testings, 50%-75% runs hit the ENOSPC failure on
my x86_64 test vm, even I removed the "-n 64k" mkfs option from the
test.

So I'd say remove "-n 64k" and test whatever mkfs options user
specified.

> > 
> > > +_scratch_mount "-o compress"

And compress mount option is not necessary to me either. btrfs compress
is a commonly tested configuration, there's no need to force the
compress option in the test. So we can test other configurations too.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo Nov. 10, 2016, 12:34 a.m. UTC | #4

At 11/09/2016 05:43 PM, Eryu Guan wrote:
> On Wed, Nov 09, 2016 at 08:24:38AM +0800, Qu Wenruo wrote:
>>
>>
>> At 11/08/2016 06:58 PM, Eryu Guan wrote:
>>> On Tue, Nov 08, 2016 at 05:15:15PM +0800, Qu Wenruo wrote:
>>>> Due to the fact that btrfs uses different max extent size for
>>>> compressed and non-compressed write, it calculates metadata reservation
>>>> incorrectly.
>>>>
>>>> This can leads to false ENOSPC alert for compressed write.
>>>>
>>>> This test case will check it by using small fs and large nodesize, and
>>>> do parallel compressed write to trigger it.
>>>>
>>>> The fix is not merged and may change in the future, but the function is
>>>> good enough:
>>>> btrfs: improve inode's outstanding_extents computation
>>>> btrfs: fix false enospc for compression
>>>>
>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> ---
> [snip]
>>>> +
>>>> +# Use small filesystem with maximum nodesize.
>>>> +# Since the false ENOSPC happens due to incorrect metadata reservation,
>>>> +# larger nodesize and small fs will make it much easier to reproduce
>>>> +_scratch_mkfs -b 512M -n 64k >> $seqres.full 2>&1
>>>
>>> How about override MKFS_OPTIONS and call _scratch_mkfs_sized? e.g.
>>>
>>> export MKFS_OPTIONS="-n 64k"
>>> _scratch_mkfs_sized $((512 * 1024 * 1024))
>>
>> Overriding MKFS_OPTIONS makes us unable to parse more mkfs options, for
>> example to test some incompatible features.
>>
>> So I'd just parse this mkfs options as extra options.
>
> We should use helpers and not open-code when helpers are available. So
> we should really use _scratch_mkfs_sized here.
>
> And "-n 64k" is only to make bug easier to reproduce, but I don't think
> it's necessary. In my testings, 50%-75% runs hit the ENOSPC failure on
> my x86_64 test vm, even I removed the "-n 64k" mkfs option from the
> test.
>
> So I'd say remove "-n 64k" and test whatever mkfs options user
> specified.

I really don't like the idea to allow user to override any mount option 
to reduce the possibility.

If using 4K nodesize, the possibility to trigger the bug will be quite 
low. (default is 16K, so you still have chance to reproduce the bug)

And further more, this testcase is not a generic test, but a 
regression/pinpoint test to expose one specific bug.

So I'll put anything to improve the possibility and won't allow user 
options to override it.


>
>>>
>>>> +_scratch_mount "-o compress"
>
> And compress mount option is not necessary to me either. btrfs compress
> is a commonly tested configuration, there's no need to force the
> compress option in the test. So we can test other configurations too.

As stated above, this is a regression test, not a generic test.

So I still need the "-o compress" mount option.
We shouldn't dependent on user to specify any special mount option, 
since tester/user are never reliable.

Further more, there are already lots of test cases specifying their own 
mount option.
So I don't see there is anything wrong using specified mount option.

Thanks,
Qu
>
> Thanks,
> Eryu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Chinner Nov. 10, 2016, 2:19 a.m. UTC | #5

On Thu, Nov 10, 2016 at 08:34:20AM +0800, Qu Wenruo wrote:
> At 11/09/2016 05:43 PM, Eryu Guan wrote:
> >On Wed, Nov 09, 2016 at 08:24:38AM +0800, Qu Wenruo wrote:
> >>At 11/08/2016 06:58 PM, Eryu Guan wrote:
> >>>On Tue, Nov 08, 2016 at 05:15:15PM +0800, Qu Wenruo wrote:
> >We should use helpers and not open-code when helpers are available. So
> >we should really use _scratch_mkfs_sized here.
> >
> >And "-n 64k" is only to make bug easier to reproduce, but I don't think
> >it's necessary. In my testings, 50%-75% runs hit the ENOSPC failure on
> >my x86_64 test vm, even I removed the "-n 64k" mkfs option from the
> >test.
> >
> >So I'd say remove "-n 64k" and test whatever mkfs options user
> >specified.
> 
> I really don't like the idea to allow user to override any mount
> option to reduce the possibility.

That's not the point. Overriding mount options reduces test coverage
because it limits the test to only the exact configuration that
reproduced the bug that was seen. If a user is testing a specific
configuration, then we really want to run the test on that config.

> And further more, this testcase is not a generic test, but a
> regression/pinpoint test to expose one specific bug.

If you want to make sure that the bug doesn't return, then you need
to run the /entire test suite/ with the configuration that exposes
the problem. You wouldn't be suggesting this specific set of mount
options if users weren't using that configuration. Hence you really
need to run the entire test suite with that configuration to make
sure you haven't broken those user's systems....

And, yes, I test lots of different XFS configurations in their
entirity every day on every change I make or review, so I'm not
suggesting that you should do anything I don't already do.

Cheers,

Dave.

Qu Wenruo Nov. 10, 2016, 2:42 a.m. UTC | #6

At 11/10/2016 10:19 AM, Dave Chinner wrote:
> On Thu, Nov 10, 2016 at 08:34:20AM +0800, Qu Wenruo wrote:
>> At 11/09/2016 05:43 PM, Eryu Guan wrote:
>>> On Wed, Nov 09, 2016 at 08:24:38AM +0800, Qu Wenruo wrote:
>>>> At 11/08/2016 06:58 PM, Eryu Guan wrote:
>>>>> On Tue, Nov 08, 2016 at 05:15:15PM +0800, Qu Wenruo wrote:
>>> We should use helpers and not open-code when helpers are available. So
>>> we should really use _scratch_mkfs_sized here.
>>>
>>> And "-n 64k" is only to make bug easier to reproduce, but I don't think
>>> it's necessary. In my testings, 50%-75% runs hit the ENOSPC failure on
>>> my x86_64 test vm, even I removed the "-n 64k" mkfs option from the
>>> test.
>>>
>>> So I'd say remove "-n 64k" and test whatever mkfs options user
>>> specified.
>>
>> I really don't like the idea to allow user to override any mount
>> option to reduce the possibility.
>
> That's not the point. Overriding mount options reduces test coverage
> because it limits the test to only the exact configuration that
> reproduced the bug that was seen. If a user is testing a specific
> configuration, then we really want to run the test on that config.
>
>> And further more, this testcase is not a generic test, but a
>> regression/pinpoint test to expose one specific bug.
>
> If you want to make sure that the bug doesn't return, then you need
> to run the /entire test suite/ with the configuration that exposes
> the problem. You wouldn't be suggesting this specific set of mount
> options if users weren't using that configuration. Hence you really
> need to run the entire test suite with that configuration to make
> sure you haven't broken those user's systems....
>
> And, yes, I test lots of different XFS configurations in their
> entirity every day on every change I make or review, so I'm not
> suggesting that you should do anything I don't already do.

OK, most of your points makes sense.
I'll update the case.

And I want to make it clear, doesn it mean, newly submitted test case 
shouldn't specify any mkfs/mount option?

Another concern is, I'm not sure if there is anyone really runs all 
btrfs mount and mkfs options on the entire test suite.
(The same is for developers who submits generic test cases)

Or there won't be so many generic or even btrfs specific test cases 
causing false alert or exposing new bugs of btrfs.

Although this is problem of btrfs. :(

Thanks,
Qu

>
> Cheers,
>
> Dave.
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eryu Guan Nov. 10, 2016, 3:24 a.m. UTC | #7

On Thu, Nov 10, 2016 at 10:42:36AM +0800, Qu Wenruo wrote:
> 
> 
> At 11/10/2016 10:19 AM, Dave Chinner wrote:
> > On Thu, Nov 10, 2016 at 08:34:20AM +0800, Qu Wenruo wrote:
> > > At 11/09/2016 05:43 PM, Eryu Guan wrote:
> > > > On Wed, Nov 09, 2016 at 08:24:38AM +0800, Qu Wenruo wrote:
> > > > > At 11/08/2016 06:58 PM, Eryu Guan wrote:
> > > > > > On Tue, Nov 08, 2016 at 05:15:15PM +0800, Qu Wenruo wrote:
> > > > We should use helpers and not open-code when helpers are available. So
> > > > we should really use _scratch_mkfs_sized here.
> > > > 
> > > > And "-n 64k" is only to make bug easier to reproduce, but I don't think
> > > > it's necessary. In my testings, 50%-75% runs hit the ENOSPC failure on
> > > > my x86_64 test vm, even I removed the "-n 64k" mkfs option from the
> > > > test.
> > > > 
> > > > So I'd say remove "-n 64k" and test whatever mkfs options user
> > > > specified.
> > > 
> > > I really don't like the idea to allow user to override any mount
> > > option to reduce the possibility.
> > 
> > That's not the point. Overriding mount options reduces test coverage
> > because it limits the test to only the exact configuration that
> > reproduced the bug that was seen. If a user is testing a specific
> > configuration, then we really want to run the test on that config.
> > 
> > > And further more, this testcase is not a generic test, but a
> > > regression/pinpoint test to expose one specific bug.
> > 
> > If you want to make sure that the bug doesn't return, then you need
> > to run the /entire test suite/ with the configuration that exposes
> > the problem. You wouldn't be suggesting this specific set of mount
> > options if users weren't using that configuration. Hence you really
> > need to run the entire test suite with that configuration to make
> > sure you haven't broken those user's systems....
> > 
> > And, yes, I test lots of different XFS configurations in their
> > entirity every day on every change I make or review, so I'm not
> > suggesting that you should do anything I don't already do.
> 
> OK, most of your points makes sense.
> I'll update the case.
> 
> And I want to make it clear, doesn it mean, newly submitted test case
> shouldn't specify any mkfs/mount option?

My understanding is that if test needs a extremely rare configuration to
hit a bug, then it should specify mkfs/mount options, e.g. xfs/297 tests
a bug in XFS only in very specific configuration:

_scratch_mkfs_xfs -d agcount=16,su=256k,sw=12 -l su=256k,size=5120b >/dev/null 2>&1

But if the bug can be reproduced by a commonly tested configuration,
e.g. btrfs with compress enabled, we should keep test coverage as large
as possible, not limit it to a certain config.

The purpose is to not lose any test coverage if possible, unless you
have a good reason to do so.

> 
> Another concern is, I'm not sure if there is anyone really runs all btrfs
> mount and mkfs options on the entire test suite.
> (The same is for developers who submits generic test cases)

It's not realistic to run full matrix of conbinations of mkfs and mount
options, but the commonly used configurations should be tested.

I think this also depends, if you care about a certain configuration, or
a distro provides full support to a certain configuration and you work
for the company behind it, as a developer or tester you should really
test it. e.g. (as a negative example) I rarely test ext4 with
"data=journal" mount option, because Red Hat doesn't support this
configuration :)

Thanks,
Eryu

> 
> Or there won't be so many generic or even btrfs specific test cases causing
> false alert or exposing new bugs of btrfs.
> 
> Although this is problem of btrfs. :(
> 
> Thanks,
> Qu
> 
> > 
> > Cheers,
> > 
> > Dave.
> > 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Chinner Nov. 10, 2016, 4:06 a.m. UTC | #8

On Thu, Nov 10, 2016 at 11:24:34AM +0800, Eryu Guan wrote:
> On Thu, Nov 10, 2016 at 10:42:36AM +0800, Qu Wenruo wrote:
> > 
> > 
> > At 11/10/2016 10:19 AM, Dave Chinner wrote:
> > > On Thu, Nov 10, 2016 at 08:34:20AM +0800, Qu Wenruo wrote:
> > > > At 11/09/2016 05:43 PM, Eryu Guan wrote:
> > > > > On Wed, Nov 09, 2016 at 08:24:38AM +0800, Qu Wenruo wrote:
> > > > > > At 11/08/2016 06:58 PM, Eryu Guan wrote:
> > > > > > > On Tue, Nov 08, 2016 at 05:15:15PM +0800, Qu Wenruo wrote:
> > > > > We should use helpers and not open-code when helpers are available. So
> > > > > we should really use _scratch_mkfs_sized here.
> > > > > 
> > > > > And "-n 64k" is only to make bug easier to reproduce, but I don't think
> > > > > it's necessary. In my testings, 50%-75% runs hit the ENOSPC failure on
> > > > > my x86_64 test vm, even I removed the "-n 64k" mkfs option from the
> > > > > test.
> > > > > 
> > > > > So I'd say remove "-n 64k" and test whatever mkfs options user
> > > > > specified.
> > > > 
> > > > I really don't like the idea to allow user to override any mount
> > > > option to reduce the possibility.
> > > 
> > > That's not the point. Overriding mount options reduces test coverage
> > > because it limits the test to only the exact configuration that
> > > reproduced the bug that was seen. If a user is testing a specific
> > > configuration, then we really want to run the test on that config.
> > > 
> > > > And further more, this testcase is not a generic test, but a
> > > > regression/pinpoint test to expose one specific bug.
> > > 
> > > If you want to make sure that the bug doesn't return, then you need
> > > to run the /entire test suite/ with the configuration that exposes
> > > the problem. You wouldn't be suggesting this specific set of mount
> > > options if users weren't using that configuration. Hence you really
> > > need to run the entire test suite with that configuration to make
> > > sure you haven't broken those user's systems....
> > > 
> > > And, yes, I test lots of different XFS configurations in their
> > > entirity every day on every change I make or review, so I'm not
> > > suggesting that you should do anything I don't already do.
> > 
> > OK, most of your points makes sense.
> > I'll update the case.
> > 
> > And I want to make it clear, doesn it mean, newly submitted test case
> > shouldn't specify any mkfs/mount option?
> 
> My understanding is that if test needs a extremely rare configuration to
> hit a bug, then it should specify mkfs/mount options, e.g. xfs/297 tests
> a bug in XFS only in very specific configuration:
> 
> _scratch_mkfs_xfs -d agcount=16,su=256k,sw=12 -l su=256k,size=5120b >/dev/null 2>&1

Yes, but there's a key reason we can do this for /XFS/. I point this
out every time this comes up:

	We can do this with /XFS/ because the *implementation of
	_scratch_mkfs_xfs()* has a fallback strategy on failure.

That is, _scratch_mkfs_xfs will first try to make the filesystems
with $MKFS_OPTIONS + $TEST_SUPPLIED_OPTIONS. If that fails, it then
runs mkfs with just $TEST_SUPPLIED_OPTIONS.

IOWs, XFS always tries to use the global options, and only removes
them when there is a problem combining them with
$TEST_SUPPLIED_OPTIONS.

This is how custom test options should be applied for all
filesystems, not just XFS. It gets rid of the need for any test to
care about MKFS_OPTIONS.

What we are also missing is that we need to apply this process to
scratch mount options as we don't do it for any filesystem.
That will get rid of the need for tests to care about what is in
MOUNT_OPTIONS, too...

Cheers,

Dave.

diff mbox

Patch

diff --git a/tests/btrfs/132 b/tests/btrfs/132
new file mode 100755
index 0000000..95c21ea
--- /dev/null
+++ b/tests/btrfs/132
@@ -0,0 +1,89 @@ 
+#! /bin/bash
+# FS QA Test 132
+#
+# Check if false ENOSPC will happen when parallel buffer write happens
+# The problem is caused by incorrect metadata reservation.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+
+# Use small filesystem with maximum nodesize.
+# Since the false ENOSPC happens due to incorrect metadata reservation,
+# larger nodesize and small fs will make it much easier to reproduce
+_scratch_mkfs -b 512M -n 64k >> $seqres.full 2>&1
+_scratch_mount "-o compress"
+
+sleep_time=$(($TIME_FACTOR * 15))
+loop_writer()
+{
+	offset=$1
+	len=$2
+	file=$3
+
+	while true;
+	do
+		# Only need stderr, and we need to specify small block size
+		# but still trigger compression
+		xfs_io -c "pwrite -b 8K $offset $len" $file 2>&1 > /dev/null
+	done
+}
+
+touch $SCRATCH_MNT/testfile
+
+# Start 2 writter writting into the same file
+# The file is 142M, which is smaller than 1/2 of the filesystem,
+# so no other cause can lead to ENOSPC.
+loop_writer 0 128M $SCRATCH_MNT/testfile &
+loop_writer 128M 16M $SCRATCH_MNT/testfile &
+
+sleep $sleep_time
+
+kill 0
+sleep
+
+status=0
+exit
diff --git a/tests/btrfs/132.out b/tests/btrfs/132.out
new file mode 100644
index 0000000..b096312
--- /dev/null
+++ b/tests/btrfs/132.out
@@ -0,0 +1,2 @@ 
+QA output created by 132
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index c090604..ec8ad80 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -134,3 +134,4 @@ 
 129 auto quick send
 130 auto clone send
 131 auto quick
+132 auto compress