Message ID | 20160719024402.19324-1-quwenruo@cn.fujitsu.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, Jul 19, 2016 at 10:44:02AM +0800, Qu Wenruo wrote: > For fully deduped file, whose file extents are all pointing to the same > extent, btrfs backref walk can be very time consuming, long enough to > trigger softlock. > > Unfortunately, btrfs send is one of the caller of such backref walk > under an O(n) loop, making the total time complexity to O(n^3) or more. > > And even worse, btrfs send will allocate memory in such loop, to trigger > OOM on system with small memory(<4G). > > This test case will check if btrfs send will cause these problems. > > Reporeted-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> > To: Filipe Manana <fdmanana@gmail.com> > Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> > --- > To Filipe: > For the soft lockup, I will try my best to figure out some method to > avoid such lockup (but it will still be very time consuming though). > > But for the OOM problem, would you mind disabling clone/reflink > detection in btrfs send? > > In fact we should really avoid doing full backref walk inside an O(n) > loop (just like previous fiemap ioctl test case), and avoid any full > backref walk if possible. > So I'm afraid that's the only solution yet. > > Thanks, > Qu > --- > tests/btrfs/127 | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/127.out | 3 ++ > tests/btrfs/group | 1 + > 3 files changed, 93 insertions(+) > create mode 100755 tests/btrfs/127 > create mode 100644 tests/btrfs/127.out > > diff --git a/tests/btrfs/127 b/tests/btrfs/127 > new file mode 100755 > index 0000000..a31a653 > --- /dev/null > +++ b/tests/btrfs/127 > @@ -0,0 +1,89 @@ > +#! /bin/bash > +# FS QA Test 127 > +# > +# Check if btrfs send can handle large deduped file, whose file extents > +# are all pointing to one extent. > +# Such file structure will cause quite large pressure to any operation which > +# iterates all backref of one extent. > +# And unfortunately, btrfs send is one of these operations, and will cause > +# softlock or OOM on systems with small memory(<4G). > +# > +#----------------------------------------------------------------------- > +# Copyright (c) 2016 Fujitsu. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#----------------------------------------------------------------------- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/reflink > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch > +_require_scratch_reflink > + > +_scratch_mkfs > /dev/null 2>&1 > +_scratch_mount > + > +nr_extents=$((4096 * $LOAD_FACTOR)) > + > +# Use 128K blocksize, the default value of both deduperemove or > +# inband dedupe > +blocksize=$((128 * 1024)) > +file=$SCRATCH_MNT/foobar > + > +# create the initial file, whose file extents are all point to one extent > +_pwrite_byte 0xcdcdcdcd 0 $blocksize $file | _filter_xfs_io > + > +for i in $(seq 1 $(($nr_extents - 1))); do > + _reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \ > + > /dev/null 2>&1 > +done > + > +# create a RO snapshot, so we can send out the snapshot > +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/ro_snap > + > +# send out the subvolume, and it will either: > +# 1) OOM since memory is allocated inside a O(n^3) loop > +# 2) Softlock since time consuming backref walk is called without scheduling. > +# the send destination is not important, just send will cause the problem > +_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1 > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/127.out b/tests/btrfs/127.out > new file mode 100644 > index 0000000..8b08bf8 > --- /dev/null > +++ b/tests/btrfs/127.out > @@ -0,0 +1,3 @@ > +QA output created by 127 > +wrote 131072/131072 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > diff --git a/tests/btrfs/group b/tests/btrfs/group > index a21a80a..d9174b5 100644 > --- a/tests/btrfs/group > +++ b/tests/btrfs/group > @@ -129,3 +129,4 @@ > 124 auto replace > 125 auto replace > 126 auto quick qgroup > +127 auto clone send This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it hangs the latest kernel, stop other tests from running, I think we can add it to 'dangerous' group as well. I can fix them at merge time, if there's no other major updates to be done. (I'll let the patch sitting in the list for more time, in case others have more review comments). Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Add Filipe to the reception, as "To:" doesn't add him automatically. Thanks, Qu At 07/19/2016 10:44 AM, Qu Wenruo wrote: > For fully deduped file, whose file extents are all pointing to the same > extent, btrfs backref walk can be very time consuming, long enough to > trigger softlock. > > Unfortunately, btrfs send is one of the caller of such backref walk > under an O(n) loop, making the total time complexity to O(n^3) or more. > > And even worse, btrfs send will allocate memory in such loop, to trigger > OOM on system with small memory(<4G). > > This test case will check if btrfs send will cause these problems. > > Reporeted-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> > To: Filipe Manana <fdmanana@gmail.com> > Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> > --- > To Filipe: > For the soft lockup, I will try my best to figure out some method to > avoid such lockup (but it will still be very time consuming though). > > But for the OOM problem, would you mind disabling clone/reflink > detection in btrfs send? > > In fact we should really avoid doing full backref walk inside an O(n) > loop (just like previous fiemap ioctl test case), and avoid any full > backref walk if possible. > So I'm afraid that's the only solution yet. > > Thanks, > Qu > --- > tests/btrfs/127 | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/127.out | 3 ++ > tests/btrfs/group | 1 + > 3 files changed, 93 insertions(+) > create mode 100755 tests/btrfs/127 > create mode 100644 tests/btrfs/127.out > > diff --git a/tests/btrfs/127 b/tests/btrfs/127 > new file mode 100755 > index 0000000..a31a653 > --- /dev/null > +++ b/tests/btrfs/127 > @@ -0,0 +1,89 @@ > +#! /bin/bash > +# FS QA Test 127 > +# > +# Check if btrfs send can handle large deduped file, whose file extents > +# are all pointing to one extent. > +# Such file structure will cause quite large pressure to any operation which > +# iterates all backref of one extent. > +# And unfortunately, btrfs send is one of these operations, and will cause > +# softlock or OOM on systems with small memory(<4G). > +# > +#----------------------------------------------------------------------- > +# Copyright (c) 2016 Fujitsu. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#----------------------------------------------------------------------- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/reflink > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch > +_require_scratch_reflink > + > +_scratch_mkfs > /dev/null 2>&1 > +_scratch_mount > + > +nr_extents=$((4096 * $LOAD_FACTOR)) > + > +# Use 128K blocksize, the default value of both deduperemove or > +# inband dedupe > +blocksize=$((128 * 1024)) > +file=$SCRATCH_MNT/foobar > + > +# create the initial file, whose file extents are all point to one extent > +_pwrite_byte 0xcdcdcdcd 0 $blocksize $file | _filter_xfs_io > + > +for i in $(seq 1 $(($nr_extents - 1))); do > + _reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \ > + > /dev/null 2>&1 > +done > + > +# create a RO snapshot, so we can send out the snapshot > +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/ro_snap > + > +# send out the subvolume, and it will either: > +# 1) OOM since memory is allocated inside a O(n^3) loop > +# 2) Softlock since time consuming backref walk is called without scheduling. > +# the send destination is not important, just send will cause the problem > +_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1 > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/127.out b/tests/btrfs/127.out > new file mode 100644 > index 0000000..8b08bf8 > --- /dev/null > +++ b/tests/btrfs/127.out > @@ -0,0 +1,3 @@ > +QA output created by 127 > +wrote 131072/131072 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > diff --git a/tests/btrfs/group b/tests/btrfs/group > index a21a80a..d9174b5 100644 > --- a/tests/btrfs/group > +++ b/tests/btrfs/group > @@ -129,3 +129,4 @@ > 124 auto replace > 125 auto replace > 126 auto quick qgroup > +127 auto clone send > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
At 07/19/2016 12:35 PM, Eryu Guan wrote: > On Tue, Jul 19, 2016 at 10:44:02AM +0800, Qu Wenruo wrote: >> For fully deduped file, whose file extents are all pointing to the same >> extent, btrfs backref walk can be very time consuming, long enough to >> trigger softlock. >> >> Unfortunately, btrfs send is one of the caller of such backref walk >> under an O(n) loop, making the total time complexity to O(n^3) or more. >> >> And even worse, btrfs send will allocate memory in such loop, to trigger >> OOM on system with small memory(<4G). >> >> This test case will check if btrfs send will cause these problems. >> >> Reporeted-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> >> To: Filipe Manana <fdmanana@gmail.com> >> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> >> --- >> To Filipe: >> For the soft lockup, I will try my best to figure out some method to >> avoid such lockup (but it will still be very time consuming though). >> >> But for the OOM problem, would you mind disabling clone/reflink >> detection in btrfs send? >> >> In fact we should really avoid doing full backref walk inside an O(n) >> loop (just like previous fiemap ioctl test case), and avoid any full >> backref walk if possible. >> So I'm afraid that's the only solution yet. >> >> Thanks, >> Qu >> --- >> tests/btrfs/127 | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++ >> tests/btrfs/127.out | 3 ++ >> tests/btrfs/group | 1 + >> 3 files changed, 93 insertions(+) >> create mode 100755 tests/btrfs/127 >> create mode 100644 tests/btrfs/127.out >> >> diff --git a/tests/btrfs/127 b/tests/btrfs/127 >> new file mode 100755 >> index 0000000..a31a653 >> --- /dev/null >> +++ b/tests/btrfs/127 >> @@ -0,0 +1,89 @@ >> +#! /bin/bash >> +# FS QA Test 127 >> +# >> +# Check if btrfs send can handle large deduped file, whose file extents >> +# are all pointing to one extent. >> +# Such file structure will cause quite large pressure to any operation which >> +# iterates all backref of one extent. >> +# And unfortunately, btrfs send is one of these operations, and will cause >> +# softlock or OOM on systems with small memory(<4G). >> +# >> +#----------------------------------------------------------------------- >> +# Copyright (c) 2016 Fujitsu. All Rights Reserved. >> +# >> +# This program is free software; you can redistribute it and/or >> +# modify it under the terms of the GNU General Public License as >> +# published by the Free Software Foundation. >> +# >> +# This program is distributed in the hope that it would be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program; if not, write the Free Software Foundation, >> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >> +#----------------------------------------------------------------------- >> +# >> + >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +here=`pwd` >> +tmp=/tmp/$$ >> +status=1 # failure is the default! >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +_cleanup() >> +{ >> + cd / >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> +. ./common/reflink >> + >> +# remove previous $seqres.full before test >> +rm -f $seqres.full >> + >> +# real QA test starts here >> + >> +# Modify as appropriate. >> +_supported_fs btrfs >> +_supported_os Linux >> +_require_scratch >> +_require_scratch_reflink >> + >> +_scratch_mkfs > /dev/null 2>&1 >> +_scratch_mount >> + >> +nr_extents=$((4096 * $LOAD_FACTOR)) >> + >> +# Use 128K blocksize, the default value of both deduperemove or >> +# inband dedupe >> +blocksize=$((128 * 1024)) >> +file=$SCRATCH_MNT/foobar >> + >> +# create the initial file, whose file extents are all point to one extent >> +_pwrite_byte 0xcdcdcdcd 0 $blocksize $file | _filter_xfs_io >> + >> +for i in $(seq 1 $(($nr_extents - 1))); do >> + _reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \ >> + > /dev/null 2>&1 >> +done >> + >> +# create a RO snapshot, so we can send out the snapshot >> +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/ro_snap >> + >> +# send out the subvolume, and it will either: >> +# 1) OOM since memory is allocated inside a O(n^3) loop >> +# 2) Softlock since time consuming backref walk is called without scheduling. >> +# the send destination is not important, just send will cause the problem >> +_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1 >> + >> +# success, all done >> +status=0 >> +exit >> diff --git a/tests/btrfs/127.out b/tests/btrfs/127.out >> new file mode 100644 >> index 0000000..8b08bf8 >> --- /dev/null >> +++ b/tests/btrfs/127.out >> @@ -0,0 +1,3 @@ >> +QA output created by 127 >> +wrote 131072/131072 bytes at offset 0 >> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >> diff --git a/tests/btrfs/group b/tests/btrfs/group >> index a21a80a..d9174b5 100644 >> --- a/tests/btrfs/group >> +++ b/tests/btrfs/group >> @@ -129,3 +129,4 @@ >> 124 auto replace >> 125 auto replace >> 126 auto quick qgroup >> +127 auto clone send > > This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it > hangs the latest kernel, stop other tests from running, I think we can > add it to 'dangerous' group as well. > Thanks for this info. I'm completely OK to add this group to 'stress' and 'dangerous'. However I'm a little curious about the meaning/standard of these groups. Does 'dangerous' conflicts with 'auto'? Since under most case, tester would just execute './check -g auto' and the system hangs at the test case. So I'm a little confused with the 'auto' group. BTW, I also hopes there will be some documentation explaining the standard of these groups, so some guys like me can avoid wasting time of maintainers. Thanks, Qu > I can fix them at merge time, if there's no other major updates to be > done. (I'll let the patch sitting in the list for more time, in case > others have more review comments). > > Thanks, > Eryu > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote: > > > > This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it > > hangs the latest kernel, stop other tests from running, I think we can > > add it to 'dangerous' group as well. > > > > Thanks for this info. > I'm completely OK to add this group to 'stress' and 'dangerous'. > > > However I'm a little curious about the meaning/standard of these groups. > > Does 'dangerous' conflicts with 'auto'? > Since under most case, tester would just execute './check -g auto' and the > system hangs at the test case. > So I'm a little confused with the 'auto' group. I quote my previous email here to explain the 'auto' group http://www.spinics.net/lists/fstests/msg03262.html " I searched for Dave's explainations on 'auto' group in his reviews, and got the following definitions: - it should be a valid & reliable test (it's finished and have deterministic output) [1] - it passes on current upstream kernels, if it fails, it's likely to be resolved in forseeable future [2] - it should take no longer than 5 minutes to finish [3] " And "The only difference between quick and auto group criteria is the test runtime." Usually 'quick' tests finish within 30s. For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add freeze and dangerous groups"), and seems that it didn't have a very clear definition[*]. But I think any test that could hang/crash recent kernels is considered as dangerous. * http://oss.sgi.com/archives/xfs/2012-03/msg00073.html For this test, it triggers soft lockup on latest 4.7-rc7 kernel and prevents further tests from running, so it's part of dangerous. And this bug will be fixed in forseeable future, right? So it's OK to add 'auto' group. And we can always remove 'dangerous' group from tests when we find they're only crashing old kernels, e.g. commit 8c94797 ext4: move 30[1234] from the dangerous to the auto group For running tests, "./check -g auto -x dangerous" might fit your need. Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
At 07/20/2016 03:01 PM, Eryu Guan wrote: > On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote: >>> >>> This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it >>> hangs the latest kernel, stop other tests from running, I think we can >>> add it to 'dangerous' group as well. >>> >> >> Thanks for this info. >> I'm completely OK to add this group to 'stress' and 'dangerous'. >> >> >> However I'm a little curious about the meaning/standard of these groups. >> >> Does 'dangerous' conflicts with 'auto'? >> Since under most case, tester would just execute './check -g auto' and the >> system hangs at the test case. >> So I'm a little confused with the 'auto' group. > > I quote my previous email here to explain the 'auto' group > http://www.spinics.net/lists/fstests/msg03262.html > > " > I searched for Dave's explainations on 'auto' group in his reviews, and > got the following definitions: > > - it should be a valid & reliable test (it's finished and have > deterministic output) [1] > - it passes on current upstream kernels, if it fails, it's likely to be > resolved in forseeable future [2] > - it should take no longer than 5 minutes to finish [3] > " > > And "The only difference between quick and auto group criteria is the > test runtime." Usually 'quick' tests finish within 30s. > > For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add > freeze and dangerous groups"), and seems that it didn't have a very > clear definition[*]. But I think any test that could hang/crash recent > kernels is considered as dangerous. > > * http://oss.sgi.com/archives/xfs/2012-03/msg00073.html Thanks for all the info, really helps a lot. Especially for quick and auto difference. Would you mind me applying this standard to current btrfs test cases? BTW, does the standard apply to *ALL* possible mount options or just *deafult* mount option? For example, btrfs/011 can finish in about 5min with default mount option, but for 'nodatasum' it can take up to 2 hours. So should it belong to 'auto'? Thanks, Qu > > For this test, it triggers soft lockup on latest 4.7-rc7 kernel and > prevents further tests from running, so it's part of dangerous. And this > bug will be fixed in forseeable future, right? So it's OK to add 'auto' > group. And we can always remove 'dangerous' group from tests when we > find they're only crashing old kernels, e.g. commit 8c94797 ext4: move > 30[1234] from the dangerous to the auto group > > For running tests, "./check -g auto -x dangerous" might fit your need. > > Thanks, > Eryu > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 20, 2016 at 03:01:00PM +0800, Eryu Guan wrote:
> For running tests, "./check -g auto -x dangerous" might fit your need.
Yes, that's precisely the way the dangerous group is intended to be
used: as a exclusion filter that gets applied to other test group
definitions.
Cheers,
Dave.
On Wed, Jul 20, 2016 at 03:40:29PM +0800, Qu Wenruo wrote: > At 07/20/2016 03:01 PM, Eryu Guan wrote: > >On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote: > >>> > >>>This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it > >>>hangs the latest kernel, stop other tests from running, I think we can > >>>add it to 'dangerous' group as well. > >>> > >> > >>Thanks for this info. > >>I'm completely OK to add this group to 'stress' and 'dangerous'. > >> > >> > >>However I'm a little curious about the meaning/standard of these groups. > >> > >>Does 'dangerous' conflicts with 'auto'? > >>Since under most case, tester would just execute './check -g auto' and the > >>system hangs at the test case. > >>So I'm a little confused with the 'auto' group. > > > >I quote my previous email here to explain the 'auto' group > >http://www.spinics.net/lists/fstests/msg03262.html > > > >" > >I searched for Dave's explainations on 'auto' group in his reviews, and > >got the following definitions: > > > >- it should be a valid & reliable test (it's finished and have > > deterministic output) [1] > >- it passes on current upstream kernels, if it fails, it's likely to be > > resolved in forseeable future [2] > >- it should take no longer than 5 minutes to finish [3] > >" > > > >And "The only difference between quick and auto group criteria is the > >test runtime." Usually 'quick' tests finish within 30s. > > > >For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add > >freeze and dangerous groups"), and seems that it didn't have a very > >clear definition[*]. But I think any test that could hang/crash recent > >kernels is considered as dangerous. > > > >* http://oss.sgi.com/archives/xfs/2012-03/msg00073.html > > Thanks for all the info, really helps a lot. > > Especially for quick and auto difference. > > Would you mind me applying this standard to current btrfs test cases? It shoul dbe applied to all test cases, regardless of the filesystem type. > BTW, does the standard apply to *ALL* possible mount options or just > *deafult* mount option? Generally it applies to the default case. > For example, btrfs/011 can finish in about 5min with default mount > option, but for 'nodatasum' it can take up to 2 hours. > So should it belong to 'auto'? Yes. Also, keep in mind that runtime is dependent on the type of storage you are testing on. The general idea is that the "< 30s quick, < 5m auto" rule is based on how long the test takes to run on a local single spindle SATA drive, as that is the basic hardware we'd expect people to be testing against. This means that a test that takes 20s on your SSD might not be a "quick" test because it takes 5 minutes on spinning rust.... Cheers, Dave.
At 07/21/2016 07:37 AM, Dave Chinner wrote: > On Wed, Jul 20, 2016 at 03:40:29PM +0800, Qu Wenruo wrote: >> At 07/20/2016 03:01 PM, Eryu Guan wrote: >>> On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote: >>>>> >>>>> This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it >>>>> hangs the latest kernel, stop other tests from running, I think we can >>>>> add it to 'dangerous' group as well. >>>>> >>>> >>>> Thanks for this info. >>>> I'm completely OK to add this group to 'stress' and 'dangerous'. >>>> >>>> >>>> However I'm a little curious about the meaning/standard of these groups. >>>> >>>> Does 'dangerous' conflicts with 'auto'? >>>> Since under most case, tester would just execute './check -g auto' and the >>>> system hangs at the test case. >>>> So I'm a little confused with the 'auto' group. >>> >>> I quote my previous email here to explain the 'auto' group >>> http://www.spinics.net/lists/fstests/msg03262.html >>> >>> " >>> I searched for Dave's explainations on 'auto' group in his reviews, and >>> got the following definitions: >>> >>> - it should be a valid & reliable test (it's finished and have >>> deterministic output) [1] >>> - it passes on current upstream kernels, if it fails, it's likely to be >>> resolved in forseeable future [2] >>> - it should take no longer than 5 minutes to finish [3] >>> " >>> >>> And "The only difference between quick and auto group criteria is the >>> test runtime." Usually 'quick' tests finish within 30s. >>> >>> For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add >>> freeze and dangerous groups"), and seems that it didn't have a very >>> clear definition[*]. But I think any test that could hang/crash recent >>> kernels is considered as dangerous. >>> >>> * http://oss.sgi.com/archives/xfs/2012-03/msg00073.html >> >> Thanks for all the info, really helps a lot. >> >> Especially for quick and auto difference. >> >> Would you mind me applying this standard to current btrfs test cases? > > It shoul dbe applied to all test cases, regardless of the filesystem > type. It's straightforward for specific fs test cases. But for generic, I'm a little concerned of the quick/auto standard. Should we use result of one single fs(and which fs? I assume xfs though) or all fs, to determine quick/auto group? For example, generic/127 involves quite a lot metadata operation, while for some fs (OK, btrfs again) metadata operation is quite slow compared to other stable fs like ext4 or xfs. So it makes quick/auto tag quite hard to determine. Thanks, Qu > >> BTW, does the standard apply to *ALL* possible mount options or just >> *deafult* mount option? > > Generally it applies to the default case. > >> For example, btrfs/011 can finish in about 5min with default mount >> option, but for 'nodatasum' it can take up to 2 hours. >> So should it belong to 'auto'? > > Yes. Also, keep in mind that runtime is dependent on the type of > storage you are testing on. The general idea is that the > "< 30s quick, < 5m auto" rule is based on how long the test takes to > run on a local single spindle SATA drive, as that is the basic > hardware we'd expect people to be testing against. This means that a > test that takes 20s on your SSD might not be a "quick" test because > it takes 5 minutes on spinning rust.... > > Cheers, > > Dave. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 21, 2016 at 10:05:25AM +0800, Qu Wenruo wrote: > > > At 07/21/2016 07:37 AM, Dave Chinner wrote: > >On Wed, Jul 20, 2016 at 03:40:29PM +0800, Qu Wenruo wrote: > >>At 07/20/2016 03:01 PM, Eryu Guan wrote: > >>>On Tue, Jul 19, 2016 at 01:42:03PM +0800, Qu Wenruo wrote: > >>>>> > >>>>>This test uses $LOAD_FACTOR, so it should be in 'stress' group. And it > >>>>>hangs the latest kernel, stop other tests from running, I think we can > >>>>>add it to 'dangerous' group as well. > >>>>> > >>>> > >>>>Thanks for this info. > >>>>I'm completely OK to add this group to 'stress' and 'dangerous'. > >>>> > >>>> > >>>>However I'm a little curious about the meaning/standard of these groups. > >>>> > >>>>Does 'dangerous' conflicts with 'auto'? > >>>>Since under most case, tester would just execute './check -g auto' and the > >>>>system hangs at the test case. > >>>>So I'm a little confused with the 'auto' group. > >>> > >>>I quote my previous email here to explain the 'auto' group > >>>http://www.spinics.net/lists/fstests/msg03262.html > >>> > >>>" > >>>I searched for Dave's explainations on 'auto' group in his reviews, and > >>>got the following definitions: > >>> > >>>- it should be a valid & reliable test (it's finished and have > >>> deterministic output) [1] > >>>- it passes on current upstream kernels, if it fails, it's likely to be > >>> resolved in forseeable future [2] > >>>- it should take no longer than 5 minutes to finish [3] > >>>" > >>> > >>>And "The only difference between quick and auto group criteria is the > >>>test runtime." Usually 'quick' tests finish within 30s. > >>> > >>>For the 'dangerous' group, it was added in commit 3f28d55c3954 ("add > >>>freeze and dangerous groups"), and seems that it didn't have a very > >>>clear definition[*]. But I think any test that could hang/crash recent > >>>kernels is considered as dangerous. > >>> > >>>* http://oss.sgi.com/archives/xfs/2012-03/msg00073.html > >> > >>Thanks for all the info, really helps a lot. > >> > >>Especially for quick and auto difference. > >> > >>Would you mind me applying this standard to current btrfs test cases? > > > >It shoul dbe applied to all test cases, regardless of the filesystem > >type. > > It's straightforward for specific fs test cases. > > But for generic, I'm a little concerned of the quick/auto standard. > Should we use result of one single fs(and which fs? I assume xfs > though) or all fs, to determine quick/auto group? It's up to the person introducing the new test to determine how it should be classified. If this causes problems for other people, then they can send patches to reclassify it appropriately based on their runtime numbers and configuration. Historicaly speaking, we've tended to ignore btrfs performance because it's been so randomly terrible. It's so often been a massive outlier that we've generally considered btrfs performance to be a bug that needs fixing and not something that requires the test to be reclassified for everyone else. > So it makes quick/auto tag quite hard to determine. It's quite straight forward, really. Send patches with numbers for the tests you want reclassified, and lots of people will say "yes, i see that too" or "no, that only takes 2s on my smallest, slowest test machine, it should remain as a quick test". And that's about it. Cheers, Dave.
diff --git a/tests/btrfs/127 b/tests/btrfs/127 new file mode 100755 index 0000000..a31a653 --- /dev/null +++ b/tests/btrfs/127 @@ -0,0 +1,89 @@ +#! /bin/bash +# FS QA Test 127 +# +# Check if btrfs send can handle large deduped file, whose file extents +# are all pointing to one extent. +# Such file structure will cause quite large pressure to any operation which +# iterates all backref of one extent. +# And unfortunately, btrfs send is one of these operations, and will cause +# softlock or OOM on systems with small memory(<4G). +# +#----------------------------------------------------------------------- +# Copyright (c) 2016 Fujitsu. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#----------------------------------------------------------------------- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/reflink + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_scratch_reflink + +_scratch_mkfs > /dev/null 2>&1 +_scratch_mount + +nr_extents=$((4096 * $LOAD_FACTOR)) + +# Use 128K blocksize, the default value of both deduperemove or +# inband dedupe +blocksize=$((128 * 1024)) +file=$SCRATCH_MNT/foobar + +# create the initial file, whose file extents are all point to one extent +_pwrite_byte 0xcdcdcdcd 0 $blocksize $file | _filter_xfs_io + +for i in $(seq 1 $(($nr_extents - 1))); do + _reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \ + > /dev/null 2>&1 +done + +# create a RO snapshot, so we can send out the snapshot +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/ro_snap + +# send out the subvolume, and it will either: +# 1) OOM since memory is allocated inside a O(n^3) loop +# 2) Softlock since time consuming backref walk is called without scheduling. +# the send destination is not important, just send will cause the problem +_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1 + +# success, all done +status=0 +exit diff --git a/tests/btrfs/127.out b/tests/btrfs/127.out new file mode 100644 index 0000000..8b08bf8 --- /dev/null +++ b/tests/btrfs/127.out @@ -0,0 +1,3 @@ +QA output created by 127 +wrote 131072/131072 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) diff --git a/tests/btrfs/group b/tests/btrfs/group index a21a80a..d9174b5 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -129,3 +129,4 @@ 124 auto replace 125 auto replace 126 auto quick qgroup +127 auto clone send
For fully deduped file, whose file extents are all pointing to the same extent, btrfs backref walk can be very time consuming, long enough to trigger softlock. Unfortunately, btrfs send is one of the caller of such backref walk under an O(n) loop, making the total time complexity to O(n^3) or more. And even worse, btrfs send will allocate memory in such loop, to trigger OOM on system with small memory(<4G). This test case will check if btrfs send will cause these problems. Reporeted-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> To: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> --- To Filipe: For the soft lockup, I will try my best to figure out some method to avoid such lockup (but it will still be very time consuming though). But for the OOM problem, would you mind disabling clone/reflink detection in btrfs send? In fact we should really avoid doing full backref walk inside an O(n) loop (just like previous fiemap ioctl test case), and avoid any full backref walk if possible. So I'm afraid that's the only solution yet. Thanks, Qu --- tests/btrfs/127 | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/127.out | 3 ++ tests/btrfs/group | 1 + 3 files changed, 93 insertions(+) create mode 100755 tests/btrfs/127 create mode 100644 tests/btrfs/127.out