Message ID | 20150402041041.GF16808@dhcp-13-216.nay.redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 02, 2015 at 12:10:41PM +0800, Eryu Guan wrote: > On Thu, Apr 02, 2015 at 09:29:54AM +1100, Dave Chinner wrote: > > On Wed, Apr 01, 2015 at 01:57:10PM -0400, Brian Foster wrote: > > > On Wed, Apr 01, 2015 at 09:54:15PM +0800, Eryu Guan wrote: > > > > xfs used to panic in this test, this xfs commit fix the bug > > > > > > > > 8d6c121 xfs: fix buffer use after free on IO error > > > > > > > > ext4 and btrfs trigger WARNING on current 4.0-rc3 kernel > > > > > > > > Signed-off-by: Eryu Guan <eguan@redhat.com> > > > > --- > > > > v2: > > > > - add _require_dm_snapshot() function to require dm snapshot target > > > > - make sure SCRATCH_DEV has enough space for the test > > > > - fail the test directly when failures detected in setup phase > > > > > > > > > > FYI, the mail subject header hasn't changed so Dave might not notice > > > this is a new patch. > > > > Saw it. > > > > > > +_mount /dev/mapper/$vgname-$snapname $mnt > > > > + > > > > +# write 5M data to the snapshot > > > > +$XFS_IO_PROG -fc "pwrite 0 5m" $mnt/testfile >>$seqres.full 2>&1 > > > > + > > > > > > I noticed there were no errors in $seqres.full when running this test. > > > E.g., the pwrite succeeds because nothing is written back to disk at > > > that point. The fs does shutdown due to the flush on umount, but it's > > > kind of hidden away up in the _cleanup() function. > > > > > > Kind of a nit, but we could be a bit more explicit and do a '-c fsync' > > > after the pwrite here? That way it's clear that writeback to disk is > > > part of the core test and we have a little feedback in $seqres.full that > > > I/O errors occurred, as expected. > > > > Added the -c fsync as I pulled it in. > > I was thinking about adding fsync or sync at first, but it causes > problems in cleanup. For some reason(I don't know clearly) fsync > sometimes pins the snapshot in use and vgremove can't remove all lvs, so > SCRATCH_DEV is still used by the test vg when running next test > > our local _scratch_mkfs routine ... > mkfs.xfs: cannot open /dev/sda6: Device or resource busy > check: failed to mkfs $SCRATCH_DEV using specified options > > From 081.full I can see > > Logical volume vg_081/snap_081 contains a filesystem in use. > > The test vg/lv has to be removed manually after the test > > vgremove -f vg_081 > > I don't find a proper way to fix it yet, but simply adding 'sleep 1' > before vgremove in _cleanup works for me > Hmm, I haven't seen that one. Seems fine for me as a temporary workaround though. Perhaps include it with the udev-settle thing that Dave needs on the creation side of things..? Brian > [root@hp-dl388eg8-01 xfstests]# git diff > diff --git a/tests/generic/081 b/tests/generic/081 > index 3e17d34..29b0240 100755 > --- a/tests/generic/081 > +++ b/tests/generic/081 > @@ -36,6 +36,7 @@ _cleanup() > rm -f $tmp.* > # lvm may have umounted it on I/O error, but in case it does not > $UMOUNT_PROG $mnt >/dev/null 2>&1 > + sleep 1 > $LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1 > $LVM_PROG pvremove -f $SCRATCH_DEV >>$seqres.full 2>&1 > } > > Thanks, > Eryu -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 02, 2015 at 07:11:45AM -0400, Brian Foster wrote: > On Thu, Apr 02, 2015 at 12:10:41PM +0800, Eryu Guan wrote: > > On Thu, Apr 02, 2015 at 09:29:54AM +1100, Dave Chinner wrote: > > > On Wed, Apr 01, 2015 at 01:57:10PM -0400, Brian Foster wrote: > > > > On Wed, Apr 01, 2015 at 09:54:15PM +0800, Eryu Guan wrote: > > > > Kind of a nit, but we could be a bit more explicit and do a '-c fsync' > > > > after the pwrite here? That way it's clear that writeback to disk is > > > > part of the core test and we have a little feedback in $seqres.full that > > > > I/O errors occurred, as expected. > > > > > > Added the -c fsync as I pulled it in. > > > > I was thinking about adding fsync or sync at first, but it causes > > problems in cleanup. For some reason(I don't know clearly) fsync > > sometimes pins the snapshot in use and vgremove can't remove all lvs, so > > SCRATCH_DEV is still used by the test vg when running next test > > > > our local _scratch_mkfs routine ... > > mkfs.xfs: cannot open /dev/sda6: Device or resource busy > > check: failed to mkfs $SCRATCH_DEV using specified options > > > > From 081.full I can see > > > > Logical volume vg_081/snap_081 contains a filesystem in use. > > > > The test vg/lv has to be removed manually after the test > > > > vgremove -f vg_081 > > > > I don't find a proper way to fix it yet, but simply adding 'sleep 1' > > before vgremove in _cleanup works for me > > Hmm, I haven't seen that one. Seems fine for me as a temporary > workaround though. Perhaps include it with the udev-settle thing that > Dave needs on the creation side of things..? More than likely. I hate udev at times.... FWIW, there's also other issues here, and I think I've finally also worked out why dm-flakey tests have interesting problems when run on ramdisks. $ cat /home/dave/src/xfstests-dev/results//xfs/generic/081.full [ snip scratch_mkfs output ] Physical volume "/dev/ram1" successfully created Volume group "vg_081" successfully created Logical volume "base_081" created Volume group "vg_081" not found Failed to create snapshot Volume group "vg_081" not found Skipping volume group vg_081 Can't open /dev/ram1 exclusively - not removing. Mounted filesystem? Which is interesting, because: $ sudo lvm vgcreate foobar /dev/ram1 Physical volume "/dev/ram1" successfully created Volume group "foobar" successfully created $ sudo lvm lvcreate -L 256M -n wtf foobar Logical volume "wtf" created $ sudo lvm pvscan PV /dev/ram1 VG foobar lvm2 [3.81 GiB / 3.56 GiB free] Total: 1 [3.81 GiB] / in use: 1 [3.81 GiB] / in no VG: 0 [0 ] $ sudo lvm lvscan ACTIVE '/dev/foobar/wtf' [256.00 MiB] inherit (failed reverse-i-search)`hexd': sudo MKFS_OPTIONS="-m crc=1,finobt=1" ./c^Cck -g auto $ sudo hexdump /dev/mapper/foobar-wtf 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 10000000 $ sudo hexdump /dev/ram1 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0000200 414c 4542 4f4c 454e 0001 0000 0000 0000 0000210 5d39 644e 0020 0000 564c 324d 3020 3130 0000220 6a54 6b6b 5a5a 7175 4c70 5434 5866 6450 0000230 4f73 6552 5330 3333 4c79 4157 6144 7678 0000240 0000 f424 0000 0000 0000 0010 0000 0000 ..... $ sudo mkfs.xfs /dev/mapper/foobar-wtf meta-data=/dev/mapper/foobar-wtf isize=256 agcount=4, agsize=16384 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 data = bsize=4096 blocks=65536, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=1605, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 $ sudo hexdump /dev/ram1 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * f4240000 $ sudo lvm lvscan No volume groups found $ because running mkfs.xfs on the LVM volume is causing the entire underlying physical device to be erased. The machine has to be rebooted at this point because /dev/ram1 is now stuck with references that cannot be removed. Oh, I bet the problem is the final flush ioctl that mkfs sends: 3420 pwrite(4, "XFSB\0\0\20\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096 3420 fsync(4) = 0 3420 ioctl(4, BLKFLSBUF, 0) = 0 3420 close(4) = 0 3420 exit_group(0) = ? 3420 +++ exited with 0 +++ Now, when mkfs.xfs is run on the ramdisk, it has a reference to the ramdisk and so brd ignores BLKFLSBUF. I bet that LVM doesn't hold a reference to the underlying block device, and so when brd receives the BLKFLSBUF passed down from /dev/mapper/foobar-wtf it sees no references and so removes all pages... This seems like a proper screwup - the brd code is abusing BLKFLSBUF to mean "free all the memory" rather than "flush dirty buffers to stable storage" like everyone else expects it to be. Hence I suspect we need a new _requires rule here for DM based tests to stop them from running on brd devices. I'll have a look at rolling it into the DM requires statements and see how that goes... Cheers, Dave.
diff --git a/tests/generic/081 b/tests/generic/081 index 3e17d34..29b0240 100755 --- a/tests/generic/081 +++ b/tests/generic/081 @@ -36,6 +36,7 @@ _cleanup() rm -f $tmp.* # lvm may have umounted it on I/O error, but in case it does not $UMOUNT_PROG $mnt >/dev/null 2>&1 + sleep 1 $LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1 $LVM_PROG pvremove -f $SCRATCH_DEV >>$seqres.full 2>&1 }