Message ID | 1462869581-19227-1-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, May 10, 2016 at 9:39 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > For a completely deduped file, which means all its file extent are > pointing to one bytenr, if calling fiemap on it, btrfs will cause soft > hang up or just takes years long. > > This bug can be reproduced even without any in-band or out-of-band > dedupe, normal clone_file_range() call can create such situation. > > This test case will detect it. Why isn't this a generic test? There's nothing btrfs specific anymore... Thanks. > > Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> > Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> > --- > tests/btrfs/028 | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/028.out | 3 +++ > tests/btrfs/group | 1 + > 3 files changed, 82 insertions(+) > create mode 100755 tests/btrfs/028 > create mode 100644 tests/btrfs/028.out > > diff --git a/tests/btrfs/028 b/tests/btrfs/028 > new file mode 100755 > index 0000000..62bcc9d > --- /dev/null > +++ b/tests/btrfs/028 > @@ -0,0 +1,78 @@ > +#! /bin/bash > +# FS QA Test 028 > +# > +# Test fiemap ioctl on heavily deduped file. > +# > +# This test will cause btrfs to soft hang up or takes years long to finish Haven't tried it, but I doubt it will take years... Are you sure that the soft lookup, which is what makes the test fail due to the dmesg warning, is triggered on very fast machines as well? I.e. this may not be reliable on better hardware. > +# > +#----------------------------------------------------------------------- > +# Copyright (c) 2016 Fujitsu. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#----------------------------------------------------------------------- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/reflink > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch_reflink > + > +blocksize=$(( 128 * 1024 )) > +nr=4096 > +file="$SCRATCH_MNT/tmp" > + > +_scratch_mkfs > +_scratch_mount > + > +# write the initial block for later reflink > +$XFS_IO_PROG -f -c "pwrite 0 $blocksize" -c "fsync" $file | _filter_xfs_io > + > +# use reflink to create the rest of the file, whose all extents are all > +# pointing to the first extent > +for i in $(seq 1 $nr); do > + $XFS_IO_PROG -c "reflink $file 0 $(( $i * $blocksize )) $blocksize" \ > + $SCRATCH_MNT/tmp > /dev/null || _fail "reflink failed" > +done > + > +# then call fiemap on that file, which shouldn't hang the fs by all means > +$XFS_IO_PROG -c "fiemap" $file >> $seqres.full > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/028.out b/tests/btrfs/028.out > new file mode 100644 > index 0000000..2b5a9a5 > --- /dev/null > +++ b/tests/btrfs/028.out > @@ -0,0 +1,3 @@ > +QA output created by 028 > +wrote 131072/131072 bytes at offset 0 > +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > diff --git a/tests/btrfs/group b/tests/btrfs/group > index da0e27f..8f6f877 100644 > --- a/tests/btrfs/group > +++ b/tests/btrfs/group > @@ -30,6 +30,7 @@ > 025 auto quick send clone > 026 auto quick compress prealloc > 027 auto replace > +028 auto clone > 029 auto quick clone > 030 auto quick send > 031 auto quick subvol clone > -- > 2.5.5 > > > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Filipe Manana wrote on 2016/05/10 11:01 +0100: > On Tue, May 10, 2016 at 9:39 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> For a completely deduped file, which means all its file extent are >> pointing to one bytenr, if calling fiemap on it, btrfs will cause soft >> hang up or just takes years long. >> >> This bug can be reproduced even without any in-band or out-of-band >> dedupe, normal clone_file_range() call can create such situation. >> >> This test case will detect it. > > Why isn't this a generic test? > There's nothing btrfs specific anymore... > > Thanks. I'm OK to move it to generic, just as original planned. BTW, does other fs support reflink file range? I found a lot xfs test cases using reflink, but I still can't reflink a file range inside the same inode --- $ xfs_io -c "reflink test.file 0 128k 128k" test.file XFS_IOC_CLONE_RANGE: Operation not supported --- > >> >> Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> >> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> >> --- >> tests/btrfs/028 | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++ >> tests/btrfs/028.out | 3 +++ >> tests/btrfs/group | 1 + >> 3 files changed, 82 insertions(+) >> create mode 100755 tests/btrfs/028 >> create mode 100644 tests/btrfs/028.out >> >> diff --git a/tests/btrfs/028 b/tests/btrfs/028 >> new file mode 100755 >> index 0000000..62bcc9d >> --- /dev/null >> +++ b/tests/btrfs/028 >> @@ -0,0 +1,78 @@ >> +#! /bin/bash >> +# FS QA Test 028 >> +# >> +# Test fiemap ioctl on heavily deduped file. >> +# >> +# This test will cause btrfs to soft hang up or takes years long to finish > > Haven't tried it, but I doubt it will take years... > Are you sure that the soft lookup, which is what makes the test fail > due to the dmesg warning, is triggered on very fast machines as well? > I.e. this may not be reliable on better hardware. On a fast test server too, using the same test case, but your concern is valid. The reporter initially triggered the bug on a even faster server with similar file layout with 100% possibility, but with nr set to 8192. I reduced the nr from 8192 (which is always reproducible) to 4096 to save some time creating file, but considering the scale of loops, considering the loop scale (at least n^3), the halved nr seems to hugely reduce the time. The know loop scale is n^3 ~ n^4: 1. Loop all file extents (* 4096) 2. Loop all backrefs of one extent (* 4096) 3. Loop each backref in __merge_refs(list_for_each_entry_safe_continue) (* 4096) 4. Loop to the list end in "while(eie & eie->next) {eie=eie->next}" (*4096) What about change nr to (8192 * $LOAD_FACTOR)? Thanks, Qu Thanks, Qu > > >> +# >> +#----------------------------------------------------------------------- >> +# Copyright (c) 2016 Fujitsu. All Rights Reserved. >> +# >> +# This program is free software; you can redistribute it and/or >> +# modify it under the terms of the GNU General Public License as >> +# published by the Free Software Foundation. >> +# >> +# This program is distributed in the hope that it would be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program; if not, write the Free Software Foundation, >> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >> +#----------------------------------------------------------------------- >> +# >> + >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +here=`pwd` >> +tmp=/tmp/$$ >> +status=1 # failure is the default! >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +_cleanup() >> +{ >> + cd / >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> +. ./common/reflink >> + >> +# remove previous $seqres.full before test >> +rm -f $seqres.full >> + >> +# real QA test starts here >> + >> +# Modify as appropriate. >> +_supported_fs btrfs >> +_supported_os Linux >> +_require_scratch_reflink >> + >> +blocksize=$(( 128 * 1024 )) >> +nr=4096 >> +file="$SCRATCH_MNT/tmp" >> + >> +_scratch_mkfs >> +_scratch_mount >> + >> +# write the initial block for later reflink >> +$XFS_IO_PROG -f -c "pwrite 0 $blocksize" -c "fsync" $file | _filter_xfs_io >> + >> +# use reflink to create the rest of the file, whose all extents are all >> +# pointing to the first extent >> +for i in $(seq 1 $nr); do >> + $XFS_IO_PROG -c "reflink $file 0 $(( $i * $blocksize )) $blocksize" \ >> + $SCRATCH_MNT/tmp > /dev/null || _fail "reflink failed" >> +done >> + >> +# then call fiemap on that file, which shouldn't hang the fs by all means >> +$XFS_IO_PROG -c "fiemap" $file >> $seqres.full >> + >> +# success, all done >> +status=0 >> +exit >> diff --git a/tests/btrfs/028.out b/tests/btrfs/028.out >> new file mode 100644 >> index 0000000..2b5a9a5 >> --- /dev/null >> +++ b/tests/btrfs/028.out >> @@ -0,0 +1,3 @@ >> +QA output created by 028 >> +wrote 131072/131072 bytes at offset 0 >> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >> diff --git a/tests/btrfs/group b/tests/btrfs/group >> index da0e27f..8f6f877 100644 >> --- a/tests/btrfs/group >> +++ b/tests/btrfs/group >> @@ -30,6 +30,7 @@ >> 025 auto quick send clone >> 026 auto quick compress prealloc >> 027 auto replace >> +028 auto clone >> 029 auto quick clone >> 030 auto quick send >> 031 auto quick subvol clone >> -- >> 2.5.5 >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe fstests" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 10, 2016 at 04:39:41PM +0800, Qu Wenruo wrote: > For a completely deduped file, which means all its file extent are > pointing to one bytenr, if calling fiemap on it, btrfs will cause soft > hang up or just takes years long. > > This bug can be reproduced even without any in-band or out-of-band > dedupe, normal clone_file_range() call can create such situation. > > This test case will detect it. Why is this a btrfs specific test? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 11, 2016 at 10:14:42AM +0800, Qu Wenruo wrote: > BTW, does other fs support reflink file range? > I found a lot xfs test cases using reflink, but I still can't reflink a file > range inside the same inode XFS work is under development and not in mainline yet. Also NFS can support reflinks if the server supports it, which includes a Linux server with btrfs or the XFS patches. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 11, 2016 at 10:14:42AM +0800, Qu Wenruo wrote: > > > Filipe Manana wrote on 2016/05/10 11:01 +0100: > >On Tue, May 10, 2016 at 9:39 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > >>For a completely deduped file, which means all its file extent are > >>pointing to one bytenr, if calling fiemap on it, btrfs will cause soft > >>hang up or just takes years long. > >> > >>This bug can be reproduced even without any in-band or out-of-band > >>dedupe, normal clone_file_range() call can create such situation. > >> > >>This test case will detect it. > > > >Why isn't this a generic test? > >There's nothing btrfs specific anymore... > > > >Thanks. > > I'm OK to move it to generic, just as original planned. Thank you! > BTW, does other fs support reflink file range? As Christoph said, future-XFS and NFS. > I found a lot xfs test cases using reflink, but I still can't reflink a file > range inside the same inode > --- > $ xfs_io -c "reflink test.file 0 128k 128k" test.file > XFS_IOC_CLONE_RANGE: Operation not supported <shrug> It should work... ...and currently works for me (4.6-rc7) on both btrfs and xfs: # rm -rf a ; dd if=/dev/zero of=a bs=131072 count=1 ; xfs_io -c 'reflink a 0 128k 128k' a ; filefrag -v a ; grep $PWD /proc/mounts 1+0 records in 1+0 records out 131072 bytes (131 kB, 128 KiB) copied, 0.000539818 s, 243 MB/s linked 131072/131072 bytes at offset 131072 128 KiB, 1 ops; 0.0000 sec (120.077 MiB/sec and 960.6148 ops/sec) Filesystem type is: 9123683e File size of a is 262144 (64 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 31: 3088.. 3119: 32: 1: 32.. 63: 3088.. 3119: 32: 3120: last,eof a: 2 extents found /dev/sda /mnt btrfs rw,relatime,space_cache,subvolid=5,subvol=/ 0 0 # cd /opt # rm -rf a ; dd if=/dev/zero of=a bs=131072 count=1 ; xfs_io -c 'reflink a 0 128k 128k' a ; filefrag -v a ; grep $PWD /proc/mounts 1+0 records in 1+0 records out 131072 bytes (131 kB, 128 KiB) copied, 0.00237377 s, 55.2 MB/s linked 131072/131072 bytes at offset 131072 128 KiB, 1 ops; 0.0000 sec (87.047 MiB/sec and 696.3788 ops/sec) Filesystem type is: 58465342 File size of a is 262144 (64 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 31: 24.. 55: 32: shared 1: 32.. 63: 24.. 55: 32: 56: last,shared,eof a: 2 extents found /dev/sdb /opt xfs rw,relatime,attr2,inode64,noquota 0 0 That said, I haven't checked with latest xfsprogs master. --D > --- > > > > >> > >>Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> > >>Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> > >>--- > >> tests/btrfs/028 | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> tests/btrfs/028.out | 3 +++ > >> tests/btrfs/group | 1 + > >> 3 files changed, 82 insertions(+) > >> create mode 100755 tests/btrfs/028 > >> create mode 100644 tests/btrfs/028.out > >> > >>diff --git a/tests/btrfs/028 b/tests/btrfs/028 > >>new file mode 100755 > >>index 0000000..62bcc9d > >>--- /dev/null > >>+++ b/tests/btrfs/028 > >>@@ -0,0 +1,78 @@ > >>+#! /bin/bash > >>+# FS QA Test 028 > >>+# > >>+# Test fiemap ioctl on heavily deduped file. > >>+# > >>+# This test will cause btrfs to soft hang up or takes years long to finish > > > >Haven't tried it, but I doubt it will take years... > >Are you sure that the soft lookup, which is what makes the test fail > >due to the dmesg warning, is triggered on very fast machines as well? > >I.e. this may not be reliable on better hardware. > > On a fast test server too, using the same test case, but your concern is > valid. > > The reporter initially triggered the bug on a even faster server with > similar file layout with 100% possibility, but with nr set to 8192. > > I reduced the nr from 8192 (which is always reproducible) to 4096 to save > some time creating file, but considering the scale of loops, considering the > loop scale (at least n^3), the halved nr seems to hugely reduce the time. > > The know loop scale is n^3 ~ n^4: > 1. Loop all file extents (* 4096) > 2. Loop all backrefs of one extent (* 4096) > 3. Loop each backref in __merge_refs(list_for_each_entry_safe_continue) (* > 4096) > 4. Loop to the list end in "while(eie & eie->next) {eie=eie->next}" (*4096) > > What about change nr to (8192 * $LOAD_FACTOR)? > > Thanks, > Qu > > > Thanks, > Qu > > > > > > >>+# > >>+#----------------------------------------------------------------------- > >>+# Copyright (c) 2016 Fujitsu. All Rights Reserved. > >>+# > >>+# This program is free software; you can redistribute it and/or > >>+# modify it under the terms of the GNU General Public License as > >>+# published by the Free Software Foundation. > >>+# > >>+# This program is distributed in the hope that it would be useful, > >>+# but WITHOUT ANY WARRANTY; without even the implied warranty of > >>+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > >>+# GNU General Public License for more details. > >>+# > >>+# You should have received a copy of the GNU General Public License > >>+# along with this program; if not, write the Free Software Foundation, > >>+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > >>+#----------------------------------------------------------------------- > >>+# > >>+ > >>+seq=`basename $0` > >>+seqres=$RESULT_DIR/$seq > >>+echo "QA output created by $seq" > >>+ > >>+here=`pwd` > >>+tmp=/tmp/$$ > >>+status=1 # failure is the default! > >>+trap "_cleanup; exit \$status" 0 1 2 3 15 > >>+ > >>+_cleanup() > >>+{ > >>+ cd / > >>+ rm -f $tmp.* > >>+} > >>+ > >>+# get standard environment, filters and checks > >>+. ./common/rc > >>+. ./common/filter > >>+. ./common/reflink > >>+ > >>+# remove previous $seqres.full before test > >>+rm -f $seqres.full > >>+ > >>+# real QA test starts here > >>+ > >>+# Modify as appropriate. > >>+_supported_fs btrfs > >>+_supported_os Linux > >>+_require_scratch_reflink > >>+ > >>+blocksize=$(( 128 * 1024 )) > >>+nr=4096 > >>+file="$SCRATCH_MNT/tmp" > >>+ > >>+_scratch_mkfs > >>+_scratch_mount > >>+ > >>+# write the initial block for later reflink > >>+$XFS_IO_PROG -f -c "pwrite 0 $blocksize" -c "fsync" $file | _filter_xfs_io > >>+ > >>+# use reflink to create the rest of the file, whose all extents are all > >>+# pointing to the first extent > >>+for i in $(seq 1 $nr); do > >>+ $XFS_IO_PROG -c "reflink $file 0 $(( $i * $blocksize )) $blocksize" \ > >>+ $SCRATCH_MNT/tmp > /dev/null || _fail "reflink failed" > >>+done > >>+ > >>+# then call fiemap on that file, which shouldn't hang the fs by all means > >>+$XFS_IO_PROG -c "fiemap" $file >> $seqres.full > >>+ > >>+# success, all done > >>+status=0 > >>+exit > >>diff --git a/tests/btrfs/028.out b/tests/btrfs/028.out > >>new file mode 100644 > >>index 0000000..2b5a9a5 > >>--- /dev/null > >>+++ b/tests/btrfs/028.out > >>@@ -0,0 +1,3 @@ > >>+QA output created by 028 > >>+wrote 131072/131072 bytes at offset 0 > >>+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > >>diff --git a/tests/btrfs/group b/tests/btrfs/group > >>index da0e27f..8f6f877 100644 > >>--- a/tests/btrfs/group > >>+++ b/tests/btrfs/group > >>@@ -30,6 +30,7 @@ > >> 025 auto quick send clone > >> 026 auto quick compress prealloc > >> 027 auto replace > >>+028 auto clone > >> 029 auto quick clone > >> 030 auto quick send > >> 031 auto quick subvol clone > >>-- > >>2.5.5 > >> > >> > >> > >>-- > >>To unsubscribe from this list: send the line "unsubscribe fstests" in > >>the body of a message to majordomo@vger.kernel.org > >>More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Darrick J. Wong wrote on 2016/05/11 17:23 -0700: > On Wed, May 11, 2016 at 10:14:42AM +0800, Qu Wenruo wrote: >> >> >> Filipe Manana wrote on 2016/05/10 11:01 +0100: >>> On Tue, May 10, 2016 at 9:39 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >>>> For a completely deduped file, which means all its file extent are >>>> pointing to one bytenr, if calling fiemap on it, btrfs will cause soft >>>> hang up or just takes years long. >>>> >>>> This bug can be reproduced even without any in-band or out-of-band >>>> dedupe, normal clone_file_range() call can create such situation. >>>> >>>> This test case will detect it. >>> >>> Why isn't this a generic test? >>> There's nothing btrfs specific anymore... >>> >>> Thanks. >> >> I'm OK to move it to generic, just as original planned. > > Thank you! > >> BTW, does other fs support reflink file range? > > As Christoph said, future-XFS and NFS. > >> I found a lot xfs test cases using reflink, but I still can't reflink a file >> range inside the same inode >> --- >> $ xfs_io -c "reflink test.file 0 128k 128k" test.file >> XFS_IOC_CLONE_RANGE: Operation not supported > > <shrug> It should work... > > ...and currently works for me (4.6-rc7) on both btrfs and xfs: Oh, I'm using 4.5-rc6, which is current btrfs for-linus branch. Thanks for your kind info! I'll try mainline kernel. > > # rm -rf a ; dd if=/dev/zero of=a bs=131072 count=1 ; xfs_io -c 'reflink a 0 128k 128k' a ; filefrag -v a ; grep $PWD /proc/mounts > 1+0 records in > 1+0 records out > 131072 bytes (131 kB, 128 KiB) copied, 0.000539818 s, 243 MB/s > linked 131072/131072 bytes at offset 131072 > 128 KiB, 1 ops; 0.0000 sec (120.077 MiB/sec and 960.6148 ops/sec) > Filesystem type is: 9123683e > File size of a is 262144 (64 blocks of 4096 bytes) > ext: logical_offset: physical_offset: length: expected: flags: > 0: 0.. 31: 3088.. 3119: 32: > 1: 32.. 63: 3088.. 3119: 32: 3120: last,eof > a: 2 extents found > /dev/sda /mnt btrfs rw,relatime,space_cache,subvolid=5,subvol=/ 0 0 > # cd /opt > # rm -rf a ; dd if=/dev/zero of=a bs=131072 count=1 ; xfs_io -c 'reflink a 0 128k 128k' a ; filefrag -v a ; grep $PWD /proc/mounts > 1+0 records in > 1+0 records out > 131072 bytes (131 kB, 128 KiB) copied, 0.00237377 s, 55.2 MB/s > linked 131072/131072 bytes at offset 131072 > 128 KiB, 1 ops; 0.0000 sec (87.047 MiB/sec and 696.3788 ops/sec) > Filesystem type is: 58465342 > File size of a is 262144 (64 blocks of 4096 bytes) > ext: logical_offset: physical_offset: length: expected: flags: > 0: 0.. 31: 24.. 55: 32: shared > 1: 32.. 63: 24.. 55: 32: 56: last,shared,eof Also the "shared" flag is different from btrfs, where btrfs is wrong, and the btrfs routine to check shared extent caused the soft lockup. I originally planned to check "shared" flag, but the soft lockup is more important, and 8000+ output seems not suitable as golden output. Thanks, Qu > a: 2 extents found > /dev/sdb /opt xfs rw,relatime,attr2,inode64,noquota 0 0 > > That said, I haven't checked with latest xfsprogs master. > > --D > >> --- >> >>> >>>> >>>> Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> >>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> >>>> --- >>>> tests/btrfs/028 | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> tests/btrfs/028.out | 3 +++ >>>> tests/btrfs/group | 1 + >>>> 3 files changed, 82 insertions(+) >>>> create mode 100755 tests/btrfs/028 >>>> create mode 100644 tests/btrfs/028.out >>>> >>>> diff --git a/tests/btrfs/028 b/tests/btrfs/028 >>>> new file mode 100755 >>>> index 0000000..62bcc9d >>>> --- /dev/null >>>> +++ b/tests/btrfs/028 >>>> @@ -0,0 +1,78 @@ >>>> +#! /bin/bash >>>> +# FS QA Test 028 >>>> +# >>>> +# Test fiemap ioctl on heavily deduped file. >>>> +# >>>> +# This test will cause btrfs to soft hang up or takes years long to finish >>> >>> Haven't tried it, but I doubt it will take years... >>> Are you sure that the soft lookup, which is what makes the test fail >>> due to the dmesg warning, is triggered on very fast machines as well? >>> I.e. this may not be reliable on better hardware. >> >> On a fast test server too, using the same test case, but your concern is >> valid. >> >> The reporter initially triggered the bug on a even faster server with >> similar file layout with 100% possibility, but with nr set to 8192. >> >> I reduced the nr from 8192 (which is always reproducible) to 4096 to save >> some time creating file, but considering the scale of loops, considering the >> loop scale (at least n^3), the halved nr seems to hugely reduce the time. >> >> The know loop scale is n^3 ~ n^4: >> 1. Loop all file extents (* 4096) >> 2. Loop all backrefs of one extent (* 4096) >> 3. Loop each backref in __merge_refs(list_for_each_entry_safe_continue) (* >> 4096) >> 4. Loop to the list end in "while(eie & eie->next) {eie=eie->next}" (*4096) >> >> What about change nr to (8192 * $LOAD_FACTOR)? >> >> Thanks, >> Qu >> >> >> Thanks, >> Qu >> >>> >>> >>>> +# >>>> +#----------------------------------------------------------------------- >>>> +# Copyright (c) 2016 Fujitsu. All Rights Reserved. >>>> +# >>>> +# This program is free software; you can redistribute it and/or >>>> +# modify it under the terms of the GNU General Public License as >>>> +# published by the Free Software Foundation. >>>> +# >>>> +# This program is distributed in the hope that it would be useful, >>>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >>>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>>> +# GNU General Public License for more details. >>>> +# >>>> +# You should have received a copy of the GNU General Public License >>>> +# along with this program; if not, write the Free Software Foundation, >>>> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >>>> +#----------------------------------------------------------------------- >>>> +# >>>> + >>>> +seq=`basename $0` >>>> +seqres=$RESULT_DIR/$seq >>>> +echo "QA output created by $seq" >>>> + >>>> +here=`pwd` >>>> +tmp=/tmp/$$ >>>> +status=1 # failure is the default! >>>> +trap "_cleanup; exit \$status" 0 1 2 3 15 >>>> + >>>> +_cleanup() >>>> +{ >>>> + cd / >>>> + rm -f $tmp.* >>>> +} >>>> + >>>> +# get standard environment, filters and checks >>>> +. ./common/rc >>>> +. ./common/filter >>>> +. ./common/reflink >>>> + >>>> +# remove previous $seqres.full before test >>>> +rm -f $seqres.full >>>> + >>>> +# real QA test starts here >>>> + >>>> +# Modify as appropriate. >>>> +_supported_fs btrfs >>>> +_supported_os Linux >>>> +_require_scratch_reflink >>>> + >>>> +blocksize=$(( 128 * 1024 )) >>>> +nr=4096 >>>> +file="$SCRATCH_MNT/tmp" >>>> + >>>> +_scratch_mkfs >>>> +_scratch_mount >>>> + >>>> +# write the initial block for later reflink >>>> +$XFS_IO_PROG -f -c "pwrite 0 $blocksize" -c "fsync" $file | _filter_xfs_io >>>> + >>>> +# use reflink to create the rest of the file, whose all extents are all >>>> +# pointing to the first extent >>>> +for i in $(seq 1 $nr); do >>>> + $XFS_IO_PROG -c "reflink $file 0 $(( $i * $blocksize )) $blocksize" \ >>>> + $SCRATCH_MNT/tmp > /dev/null || _fail "reflink failed" >>>> +done >>>> + >>>> +# then call fiemap on that file, which shouldn't hang the fs by all means >>>> +$XFS_IO_PROG -c "fiemap" $file >> $seqres.full >>>> + >>>> +# success, all done >>>> +status=0 >>>> +exit >>>> diff --git a/tests/btrfs/028.out b/tests/btrfs/028.out >>>> new file mode 100644 >>>> index 0000000..2b5a9a5 >>>> --- /dev/null >>>> +++ b/tests/btrfs/028.out >>>> @@ -0,0 +1,3 @@ >>>> +QA output created by 028 >>>> +wrote 131072/131072 bytes at offset 0 >>>> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) >>>> diff --git a/tests/btrfs/group b/tests/btrfs/group >>>> index da0e27f..8f6f877 100644 >>>> --- a/tests/btrfs/group >>>> +++ b/tests/btrfs/group >>>> @@ -30,6 +30,7 @@ >>>> 025 auto quick send clone >>>> 026 auto quick compress prealloc >>>> 027 auto replace >>>> +028 auto clone >>>> 029 auto quick clone >>>> 030 auto quick send >>>> 031 auto quick subvol clone >>>> -- >>>> 2.5.5 >>>> >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe fstests" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 12, 2016 at 08:46:41AM +0800, Qu Wenruo wrote: > >Filesystem type is: 58465342 > >File size of a is 262144 (64 blocks of 4096 bytes) > > ext: logical_offset: physical_offset: length: expected: flags: > > 0: 0.. 31: 24.. 55: 32: shared > > 1: 32.. 63: 24.. 55: 32: 56: last,shared,eof > > Also the "shared" flag is different from btrfs, where btrfs is > wrong, and the btrfs routine to check shared extent caused the soft > lockup. > > I originally planned to check "shared" flag, but the soft lockup is > more important, and 8000+ output seems not suitable as golden > output. If that's what the test produces for correct behaviour, then there isn't any problem with having golden output that large. e.g. tests/xfs/136.out has 7800 lines in its golden output file. There are quite a few tests with large amounts of output: $ find . -name *.out -exec ls -s {} \; |sort -nr |head -5 144 ./tests/xfs/136.out 124 ./tests/generic/324.out 120 ./tests/xfs/165.out 116 ./tests/xfs/107.out 92 ./tests/btrfs/034.out $ Cheers, Dave.
Dave Chinner wrote on 2016/05/12 11:19 +1000: > On Thu, May 12, 2016 at 08:46:41AM +0800, Qu Wenruo wrote: >>> Filesystem type is: 58465342 >>> File size of a is 262144 (64 blocks of 4096 bytes) >>> ext: logical_offset: physical_offset: length: expected: flags: >>> 0: 0.. 31: 24.. 55: 32: shared >>> 1: 32.. 63: 24.. 55: 32: 56: last,shared,eof >> >> Also the "shared" flag is different from btrfs, where btrfs is >> wrong, and the btrfs routine to check shared extent caused the soft >> lockup. >> >> I originally planned to check "shared" flag, but the soft lockup is >> more important, and 8000+ output seems not suitable as golden >> output. > > If that's what the test produces for correct behaviour, then there > isn't any problem with having golden output that large. e.g. > tests/xfs/136.out has 7800 lines in its golden output file. There > are quite a few tests with large amounts of output: > > $ find . -name *.out -exec ls -s {} \; |sort -nr |head -5 > 144 ./tests/xfs/136.out > 124 ./tests/generic/324.out > 120 ./tests/xfs/165.out > 116 ./tests/xfs/107.out > 92 ./tests/btrfs/034.out > $ > > Cheers, > > Dave. > Great, now the test case can check not only the btrfs soft lockup but also shared flags. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/tests/btrfs/028 b/tests/btrfs/028 new file mode 100755 index 0000000..62bcc9d --- /dev/null +++ b/tests/btrfs/028 @@ -0,0 +1,78 @@ +#! /bin/bash +# FS QA Test 028 +# +# Test fiemap ioctl on heavily deduped file. +# +# This test will cause btrfs to soft hang up or takes years long to finish +# +#----------------------------------------------------------------------- +# Copyright (c) 2016 Fujitsu. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#----------------------------------------------------------------------- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/reflink + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch_reflink + +blocksize=$(( 128 * 1024 )) +nr=4096 +file="$SCRATCH_MNT/tmp" + +_scratch_mkfs +_scratch_mount + +# write the initial block for later reflink +$XFS_IO_PROG -f -c "pwrite 0 $blocksize" -c "fsync" $file | _filter_xfs_io + +# use reflink to create the rest of the file, whose all extents are all +# pointing to the first extent +for i in $(seq 1 $nr); do + $XFS_IO_PROG -c "reflink $file 0 $(( $i * $blocksize )) $blocksize" \ + $SCRATCH_MNT/tmp > /dev/null || _fail "reflink failed" +done + +# then call fiemap on that file, which shouldn't hang the fs by all means +$XFS_IO_PROG -c "fiemap" $file >> $seqres.full + +# success, all done +status=0 +exit diff --git a/tests/btrfs/028.out b/tests/btrfs/028.out new file mode 100644 index 0000000..2b5a9a5 --- /dev/null +++ b/tests/btrfs/028.out @@ -0,0 +1,3 @@ +QA output created by 028 +wrote 131072/131072 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) diff --git a/tests/btrfs/group b/tests/btrfs/group index da0e27f..8f6f877 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -30,6 +30,7 @@ 025 auto quick send clone 026 auto quick compress prealloc 027 auto replace +028 auto clone 029 auto quick clone 030 auto quick send 031 auto quick subvol clone
For a completely deduped file, which means all its file extent are pointing to one bytenr, if calling fiemap on it, btrfs will cause soft hang up or just takes years long. This bug can be reproduced even without any in-band or out-of-band dedupe, normal clone_file_range() call can create such situation. This test case will detect it. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> --- tests/btrfs/028 | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/028.out | 3 +++ tests/btrfs/group | 1 + 3 files changed, 82 insertions(+) create mode 100755 tests/btrfs/028 create mode 100644 tests/btrfs/028.out