diff mbox series

[02/11] btrfs-progs: misc-tests/034: mount the second device if first device mount failed

Message ID 20191212110204.11128-3-Damenly_Su@gmx.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: metadata_uuid feature fixes and portation | expand

Commit Message

Su Yue Dec. 12, 2019, 11:01 a.m. UTC
From: Su Yue <Damenly_Su@gmx.com>

The 034 test may fail to mount, and dmesg says open_ctree() failed due
to device missing.

The partly work flow is
step1 loop1 = losetup image1
step2 loop2 = losetup image2
setp3 mount loop1

The dmesg says the loop2 device is missing.
It's possible and known that while step3 is in open_ctree() and
fs_devices->opened is nonzero, loop2 device has not been added into the
fs_devces. Then read_one_chunk() reports that loop2 is missing.

The solution for this test is try to mount loop2 if loop mount failed.

Fixes: 0de2e22ad226 ("btrfs-progs: tests: Add tests for changing fsid feature")
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
---
 tests/misc-tests/034-metadata-uuid/test.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Nikolay Borisov Jan. 31, 2020, 8:03 a.m. UTC | #1
On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
> From: Su Yue <Damenly_Su@gmx.com>
> 
> The 034 test may fail to mount, and dmesg says open_ctree() failed due
> to device missing.
> 
> The partly work flow is
> step1 loop1 = losetup image1
> step2 loop2 = losetup image2
> setp3 mount loop1
> 
> The dmesg says the loop2 device is missing.
> It's possible and known that while step3 is in open_ctree() and
> fs_devices->opened is nonzero, loop2 device has not been added into the


Care to give more details how this can happen? I haven't observed such a
failure, meaning it's likely due to some race condition. More details
are needed though. In your change log you say "it's known" but
apparently only to you in this case.
Su Yue Jan. 31, 2020, 10:01 a.m. UTC | #2
On 2020/1/31 4:03 PM, Nikolay Borisov wrote:
>
>
> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>> From: Su Yue <Damenly_Su@gmx.com>
>>
>> The 034 test may fail to mount, and dmesg says open_ctree() failed due
>> to device missing.
>>
>> The partly work flow is
>> step1 loop1 = losetup image1
>> step2 loop2 = losetup image2
>> setp3 mount loop1
>>
>> The dmesg says the loop2 device is missing.
>> It's possible and known that while step3 is in open_ctree() and
>> fs_devices->opened is nonzero, loop2 device has not been added into the
>
>
> Care to give more details how this can happen? I haven't observed such a
> failure, meaning it's likely due to some race condition. More details
> are needed though. In your change log you say "it's known" but
> apparently only to you in this case.
>

Sure. There's a device missing situation[1] if two
devices(raid 1/0) were caught by udev. Yes, it's
not related to the metadata fsid feature. It just
makes the mount operation due to the missing device then
the test fails.

In this script, $loop1 *may* be failed to be mounted because
$loop2 is "missing". Mounting $loop2 device can verify the
metadata fsid functionality but without the degraded option.


[1]: https://www.spinics.net/lists/linux-btrfs/msg96312.html
Nikolay Borisov Jan. 31, 2020, 12:47 p.m. UTC | #3
On 31.01.20 г. 12:01 ч., Su Yue wrote:
> On 2020/1/31 4:03 PM, Nikolay Borisov wrote:
>>
>>
>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>> From: Su Yue <Damenly_Su@gmx.com>
>>>
>>> The 034 test may fail to mount, and dmesg says open_ctree() failed due
>>> to device missing.
>>>
>>> The partly work flow is
>>> step1 loop1 = losetup image1
>>> step2 loop2 = losetup image2
>>> setp3 mount loop1
>>>
>>> The dmesg says the loop2 device is missing.
>>> It's possible and known that while step3 is in open_ctree() and
>>> fs_devices->opened is nonzero, loop2 device has not been added into the
>>
>>
>> Care to give more details how this can happen? I haven't observed such a
>> failure, meaning it's likely due to some race condition. More details
>> are needed though. In your change log you say "it's known" but
>> apparently only to you in this case.
>>
> 
> Sure. There's a device missing situation[1] if two
> devices(raid 1/0) were caught by udev. Yes, it's
> not related to the metadata fsid feature. It just
> makes the mount operation due to the missing device then
> the test fails.

Ok but in those mail posts it says the problem occurs if we have a
multi-device btrfs volume, in this case raid1, and one of the devices is
missing. The pertinent question is why would any of the testing devices
be missing? Did you actually experience such failure ? loop1 is acquired
after running losetup --find --show, implying that after the command is
finished the given loopback device is fully present to the system?



> 
> In this script, $loop1 *may* be failed to be mounted because
> $loop2 is "missing". Mounting $loop2 device can verify the
> metadata fsid functionality but without the degraded option.
> 
> 
> [1]: https://www.spinics.net/lists/linux-btrfs/msg96312.html
Su Yue Feb. 4, 2020, 4:40 a.m. UTC | #4
On 2020/1/31 8:47 PM, Nikolay Borisov wrote:
>
>
> On 31.01.20 г. 12:01 ч., Su Yue wrote:
>> On 2020/1/31 4:03 PM, Nikolay Borisov wrote:
>>>
>>>
>>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>>> From: Su Yue <Damenly_Su@gmx.com>
>>>>
>>>> The 034 test may fail to mount, and dmesg says open_ctree() failed due
>>>> to device missing.
>>>>
>>>> The partly work flow is
>>>> step1 loop1 = losetup image1
>>>> step2 loop2 = losetup image2
>>>> setp3 mount loop1
>>>>
>>>> The dmesg says the loop2 device is missing.
>>>> It's possible and known that while step3 is in open_ctree() and
>>>> fs_devices->opened is nonzero, loop2 device has not been added into the
>>>
>>>
>>> Care to give more details how this can happen? I haven't observed such a
>>> failure, meaning it's likely due to some race condition. More details
>>> are needed though. In your change log you say "it's known" but
>>> apparently only to you in this case.
>>>
>>
>> Sure. There's a device missing situation[1] if two
>> devices(raid 1/0) were caught by udev. Yes, it's
>> not related to the metadata fsid feature. It just
>> makes the mount operation due to the missing device then
>> the test fails.
>
> Ok but in those mail posts it says the problem occurs if we have a
> multi-device btrfs volume, in this case raid1, and one of the devices is
> missing. The pertinent question is why would any of the testing devices
> be missing? Did you actually experience such failure ? loop1 is acquired
> after running losetup --find --show, implying that after the command is
> finished the given loopback device is fully present to the system?
>
>
Yes, I did experience such failures. Although I'm not familiar with
udevd, found something for your questions.
My superficial answers blow after looking through loop device
and udevd codes. If found something wrong please correct me.

After "losetup --find --show", the loopback devices are full
present to the system. And userspace received uevents from
kernel. Losetup only handles the loopback device things but
not such fs things on the loopback device. It's udevd' work to
handle the uevent for the device by rules.

The issue is that udevd may be handling the uevent of one device while
doing mount the another device. For btrfs, udevd calls ioctls on
/dev/btrfs-control.


Thread *mounting device1*            Thread *scanning device2*


btrfs_mount_root                     btrfs_control_ioctl

   mutex_lock(&uuid_mutex);

     btrfs_read_disk_super
     btrfs_scan_one_device
     --> there is only device1
     in the fs_devices

     btrfs_open_devices
       fs_devices->opened = 1
     mutex_unlock(&uuid_mutex);

                                       mutex_lock(&uuid_mutex);
                                       btrfs_scan_one_device
                                         btrfs_read_disk_super

                                         device_list_add
                                           found fs_devices
                                             device = btrfs_find_device

                                           if (!device)
                                              if(fs_devices->opened)
                                                 return -EBUSY
                                              --> the device2 adding
                                                  aborts since
						 fs_devices was opened
                                       mutex_unlock(&uuid_mutex);
   btrfs_fill_super
     open_ctree
       btrfs_read_sys_array
         read_one_chunk
	--> error due to the
	    device2 missing


Then mount failed because of the missing device.


>
>>
>> In this script, $loop1 *may* be failed to be mounted because
>> $loop2 is "missing". Mounting $loop2 device can verify the
>> metadata fsid functionality but without the degraded option.
>>
>>
>> [1]: https://www.spinics.net/lists/linux-btrfs/msg96312.html
diff mbox series

Patch

diff --git a/tests/misc-tests/034-metadata-uuid/test.sh b/tests/misc-tests/034-metadata-uuid/test.sh
index ff51bf22fadf..5fe553705fcf 100755
--- a/tests/misc-tests/034-metadata-uuid/test.sh
+++ b/tests/misc-tests/034-metadata-uuid/test.sh
@@ -173,7 +173,9 @@  failure_recovery() {
 	loop2=$(run_check_stdout $SUDO_HELPER losetup --find --show "$image2")
 
 	# Mount and unmount, on trans commit all disks should be consistent
-	run_check $SUDO_HELPER mount "$loop1" "$TEST_MNT"
+	run_mayfail $SUDO_HELPER mount "$loop1" "$TEST_MNT"
+	[ $? -ne 0 ] && run_check $SUDO_HELPER mount "$loop2" "$TEST_MNT"
+
 	run_check $SUDO_HELPER umount "$TEST_MNT"
 
 	# perform any specific check