Message ID | 20191212110204.11128-3-Damenly_Su@gmx.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs-progs: metadata_uuid feature fixes and portation | expand |
On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote: > From: Su Yue <Damenly_Su@gmx.com> > > The 034 test may fail to mount, and dmesg says open_ctree() failed due > to device missing. > > The partly work flow is > step1 loop1 = losetup image1 > step2 loop2 = losetup image2 > setp3 mount loop1 > > The dmesg says the loop2 device is missing. > It's possible and known that while step3 is in open_ctree() and > fs_devices->opened is nonzero, loop2 device has not been added into the Care to give more details how this can happen? I haven't observed such a failure, meaning it's likely due to some race condition. More details are needed though. In your change log you say "it's known" but apparently only to you in this case.
On 2020/1/31 4:03 PM, Nikolay Borisov wrote: > > > On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote: >> From: Su Yue <Damenly_Su@gmx.com> >> >> The 034 test may fail to mount, and dmesg says open_ctree() failed due >> to device missing. >> >> The partly work flow is >> step1 loop1 = losetup image1 >> step2 loop2 = losetup image2 >> setp3 mount loop1 >> >> The dmesg says the loop2 device is missing. >> It's possible and known that while step3 is in open_ctree() and >> fs_devices->opened is nonzero, loop2 device has not been added into the > > > Care to give more details how this can happen? I haven't observed such a > failure, meaning it's likely due to some race condition. More details > are needed though. In your change log you say "it's known" but > apparently only to you in this case. > Sure. There's a device missing situation[1] if two devices(raid 1/0) were caught by udev. Yes, it's not related to the metadata fsid feature. It just makes the mount operation due to the missing device then the test fails. In this script, $loop1 *may* be failed to be mounted because $loop2 is "missing". Mounting $loop2 device can verify the metadata fsid functionality but without the degraded option. [1]: https://www.spinics.net/lists/linux-btrfs/msg96312.html
On 31.01.20 г. 12:01 ч., Su Yue wrote: > On 2020/1/31 4:03 PM, Nikolay Borisov wrote: >> >> >> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote: >>> From: Su Yue <Damenly_Su@gmx.com> >>> >>> The 034 test may fail to mount, and dmesg says open_ctree() failed due >>> to device missing. >>> >>> The partly work flow is >>> step1 loop1 = losetup image1 >>> step2 loop2 = losetup image2 >>> setp3 mount loop1 >>> >>> The dmesg says the loop2 device is missing. >>> It's possible and known that while step3 is in open_ctree() and >>> fs_devices->opened is nonzero, loop2 device has not been added into the >> >> >> Care to give more details how this can happen? I haven't observed such a >> failure, meaning it's likely due to some race condition. More details >> are needed though. In your change log you say "it's known" but >> apparently only to you in this case. >> > > Sure. There's a device missing situation[1] if two > devices(raid 1/0) were caught by udev. Yes, it's > not related to the metadata fsid feature. It just > makes the mount operation due to the missing device then > the test fails. Ok but in those mail posts it says the problem occurs if we have a multi-device btrfs volume, in this case raid1, and one of the devices is missing. The pertinent question is why would any of the testing devices be missing? Did you actually experience such failure ? loop1 is acquired after running losetup --find --show, implying that after the command is finished the given loopback device is fully present to the system? > > In this script, $loop1 *may* be failed to be mounted because > $loop2 is "missing". Mounting $loop2 device can verify the > metadata fsid functionality but without the degraded option. > > > [1]: https://www.spinics.net/lists/linux-btrfs/msg96312.html
On 2020/1/31 8:47 PM, Nikolay Borisov wrote: > > > On 31.01.20 г. 12:01 ч., Su Yue wrote: >> On 2020/1/31 4:03 PM, Nikolay Borisov wrote: >>> >>> >>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote: >>>> From: Su Yue <Damenly_Su@gmx.com> >>>> >>>> The 034 test may fail to mount, and dmesg says open_ctree() failed due >>>> to device missing. >>>> >>>> The partly work flow is >>>> step1 loop1 = losetup image1 >>>> step2 loop2 = losetup image2 >>>> setp3 mount loop1 >>>> >>>> The dmesg says the loop2 device is missing. >>>> It's possible and known that while step3 is in open_ctree() and >>>> fs_devices->opened is nonzero, loop2 device has not been added into the >>> >>> >>> Care to give more details how this can happen? I haven't observed such a >>> failure, meaning it's likely due to some race condition. More details >>> are needed though. In your change log you say "it's known" but >>> apparently only to you in this case. >>> >> >> Sure. There's a device missing situation[1] if two >> devices(raid 1/0) were caught by udev. Yes, it's >> not related to the metadata fsid feature. It just >> makes the mount operation due to the missing device then >> the test fails. > > Ok but in those mail posts it says the problem occurs if we have a > multi-device btrfs volume, in this case raid1, and one of the devices is > missing. The pertinent question is why would any of the testing devices > be missing? Did you actually experience such failure ? loop1 is acquired > after running losetup --find --show, implying that after the command is > finished the given loopback device is fully present to the system? > > Yes, I did experience such failures. Although I'm not familiar with udevd, found something for your questions. My superficial answers blow after looking through loop device and udevd codes. If found something wrong please correct me. After "losetup --find --show", the loopback devices are full present to the system. And userspace received uevents from kernel. Losetup only handles the loopback device things but not such fs things on the loopback device. It's udevd' work to handle the uevent for the device by rules. The issue is that udevd may be handling the uevent of one device while doing mount the another device. For btrfs, udevd calls ioctls on /dev/btrfs-control. Thread *mounting device1* Thread *scanning device2* btrfs_mount_root btrfs_control_ioctl mutex_lock(&uuid_mutex); btrfs_read_disk_super btrfs_scan_one_device --> there is only device1 in the fs_devices btrfs_open_devices fs_devices->opened = 1 mutex_unlock(&uuid_mutex); mutex_lock(&uuid_mutex); btrfs_scan_one_device btrfs_read_disk_super device_list_add found fs_devices device = btrfs_find_device if (!device) if(fs_devices->opened) return -EBUSY --> the device2 adding aborts since fs_devices was opened mutex_unlock(&uuid_mutex); btrfs_fill_super open_ctree btrfs_read_sys_array read_one_chunk --> error due to the device2 missing Then mount failed because of the missing device. > >> >> In this script, $loop1 *may* be failed to be mounted because >> $loop2 is "missing". Mounting $loop2 device can verify the >> metadata fsid functionality but without the degraded option. >> >> >> [1]: https://www.spinics.net/lists/linux-btrfs/msg96312.html
diff --git a/tests/misc-tests/034-metadata-uuid/test.sh b/tests/misc-tests/034-metadata-uuid/test.sh index ff51bf22fadf..5fe553705fcf 100755 --- a/tests/misc-tests/034-metadata-uuid/test.sh +++ b/tests/misc-tests/034-metadata-uuid/test.sh @@ -173,7 +173,9 @@ failure_recovery() { loop2=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image2") # Mount and unmount, on trans commit all disks should be consistent - run_check $SUDO_HELPER mount "$loop1" "$TEST_MNT" + run_mayfail $SUDO_HELPER mount "$loop1" "$TEST_MNT" + [ $? -ne 0 ] && run_check $SUDO_HELPER mount "$loop2" "$TEST_MNT" + run_check $SUDO_HELPER umount "$TEST_MNT" # perform any specific check