diff mbox series

[v2] btrfs: do not skip re-registration for the mounted device

Message ID 88673c60b1d866c289ef019945647adfc8ab51d0.1707781507.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series [v2] btrfs: do not skip re-registration for the mounted device | expand

Commit Message

Anand Jain Feb. 13, 2024, 1:13 a.m. UTC
There are reports that since version 6.7 update-grub fails to find the
device of the root on systems without initrd and on a single device.

This looks like the device name changed in the output of
/proc/self/mountinfo:

6.5-rc5 working

  18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ...

6.7 not working:

  17 1 0:15 / / rw,noatime - btrfs /dev/root ...

and "update-grub" shows this error:

  /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?)

This looks like it's related to the device name, but grub-probe
recognizes the "/dev/root" path and tries to find the underlying device.
However there's a special case for some filesystems, for btrfs in
particular.

The generic root device detection heuristic is not done and it all
relies on reading the device infos by a btrfs specific ioctl. This ioctl
returns the device name as it was saved at the time of device scan (in
this case it's /dev/root).

The change in 6.7 for temp_fsid to allow several single device
filesystem to exist with the same fsid (and transparently generate a new
UUID at mount time) was to skip caching/registering such devices.

This also skipped mounted device. One step of scanning is to check if
the device name hasn't changed, and if yes then update the cached value.

This broke the grub-probe as it always read the device /dev/root and
couldn't find it in the system. A temporary workaround is to create a
symlink but this does not survive reboot.

The right fix is to allow updating the device path of a mounted
filesystem even if this is a single device one.

In the fix, check if the device's major:minor number matches with the
cached device. If they do, then we can allow the scan to happen so that
device_list_add() can take care of updating the device path. The file
descriptor remains unchanged.

This does not affect the temp_fsid feature, the UUID of the mounted
filesystem remains the same and the matching is based on device major:minor
which is unique per mounted filesystem.

This covers the path when the device (that exists for all mounted
devices) name changes, updating /dev/root to /dev/sdx. Any other single
device with filesystem and is not mounted is still skipped.

Note that if a system is booted and initial mount is done on the
/dev/root device, this will be the cached name of the device. Only after
the command "btrfs device scan" it will change as it triggers the
rename.

The fix was verified by users whose systems were affected.

CC: stable@vger.kernel.org # 6.7+
Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353
Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs8Y2TQ@mail.gmail.com/
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Tested-by: Alex Romosan <aromosan@gmail.com>
Tested-by: CHECK_1234543212345@protonmail.com
---

v2:
 Updated git commit log from [PATCH] with permission. Thx.
     [PATCH] btrfs: always scan a single device when mounted
 Add Tested-by.
 
 fs/btrfs/volumes.c | 44 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)

Comments

David Sterba Feb. 14, 2024, 7:16 a.m. UTC | #1
On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote:
> There are reports that since version 6.7 update-grub fails to find the
> device of the root on systems without initrd and on a single device.
> 
> This looks like the device name changed in the output of
> /proc/self/mountinfo:
> 
> 6.5-rc5 working
> 
>   18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ...
> 
> 6.7 not working:
> 
>   17 1 0:15 / / rw,noatime - btrfs /dev/root ...
> 
> and "update-grub" shows this error:
> 
>   /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?)
> 
> This looks like it's related to the device name, but grub-probe
> recognizes the "/dev/root" path and tries to find the underlying device.
> However there's a special case for some filesystems, for btrfs in
> particular.
> 
> The generic root device detection heuristic is not done and it all
> relies on reading the device infos by a btrfs specific ioctl. This ioctl
> returns the device name as it was saved at the time of device scan (in
> this case it's /dev/root).
> 
> The change in 6.7 for temp_fsid to allow several single device
> filesystem to exist with the same fsid (and transparently generate a new
> UUID at mount time) was to skip caching/registering such devices.
> 
> This also skipped mounted device. One step of scanning is to check if
> the device name hasn't changed, and if yes then update the cached value.
> 
> This broke the grub-probe as it always read the device /dev/root and
> couldn't find it in the system. A temporary workaround is to create a
> symlink but this does not survive reboot.
> 
> The right fix is to allow updating the device path of a mounted
> filesystem even if this is a single device one.
> 
> In the fix, check if the device's major:minor number matches with the
> cached device. If they do, then we can allow the scan to happen so that
> device_list_add() can take care of updating the device path. The file
> descriptor remains unchanged.
> 
> This does not affect the temp_fsid feature, the UUID of the mounted
> filesystem remains the same and the matching is based on device major:minor
> which is unique per mounted filesystem.
> 
> This covers the path when the device (that exists for all mounted
> devices) name changes, updating /dev/root to /dev/sdx. Any other single
> device with filesystem and is not mounted is still skipped.
> 
> Note that if a system is booted and initial mount is done on the
> /dev/root device, this will be the cached name of the device. Only after
> the command "btrfs device scan" it will change as it triggers the
> rename.
> 
> The fix was verified by users whose systems were affected.
> 
> CC: stable@vger.kernel.org # 6.7+
> Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem")
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353
> Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs8Y2TQ@mail.gmail.com/
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Tested-by: Alex Romosan <aromosan@gmail.com>
> Tested-by: CHECK_1234543212345@protonmail.com

Reviewed-by: David Sterba <dsterba@suse.com>

When you commit the patch, please reorder the tags according to
https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering
Filipe Manana Feb. 20, 2024, 2:08 p.m. UTC | #2
On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote:
> > There are reports that since version 6.7 update-grub fails to find the
> > device of the root on systems without initrd and on a single device.
> >
> > This looks like the device name changed in the output of
> > /proc/self/mountinfo:
> >
> > 6.5-rc5 working
> >
> >   18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ...
> >
> > 6.7 not working:
> >
> >   17 1 0:15 / / rw,noatime - btrfs /dev/root ...
> >
> > and "update-grub" shows this error:
> >
> >   /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?)
> >
> > This looks like it's related to the device name, but grub-probe
> > recognizes the "/dev/root" path and tries to find the underlying device.
> > However there's a special case for some filesystems, for btrfs in
> > particular.
> >
> > The generic root device detection heuristic is not done and it all
> > relies on reading the device infos by a btrfs specific ioctl. This ioctl
> > returns the device name as it was saved at the time of device scan (in
> > this case it's /dev/root).
> >
> > The change in 6.7 for temp_fsid to allow several single device
> > filesystem to exist with the same fsid (and transparently generate a new
> > UUID at mount time) was to skip caching/registering such devices.
> >
> > This also skipped mounted device. One step of scanning is to check if
> > the device name hasn't changed, and if yes then update the cached value.
> >
> > This broke the grub-probe as it always read the device /dev/root and
> > couldn't find it in the system. A temporary workaround is to create a
> > symlink but this does not survive reboot.
> >
> > The right fix is to allow updating the device path of a mounted
> > filesystem even if this is a single device one.
> >
> > In the fix, check if the device's major:minor number matches with the
> > cached device. If they do, then we can allow the scan to happen so that
> > device_list_add() can take care of updating the device path. The file
> > descriptor remains unchanged.
> >
> > This does not affect the temp_fsid feature, the UUID of the mounted
> > filesystem remains the same and the matching is based on device major:minor
> > which is unique per mounted filesystem.
> >
> > This covers the path when the device (that exists for all mounted
> > devices) name changes, updating /dev/root to /dev/sdx. Any other single
> > device with filesystem and is not mounted is still skipped.
> >
> > Note that if a system is booted and initial mount is done on the
> > /dev/root device, this will be the cached name of the device. Only after
> > the command "btrfs device scan" it will change as it triggers the
> > rename.
> >
> > The fix was verified by users whose systems were affected.
> >
> > CC: stable@vger.kernel.org # 6.7+
> > Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem")
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353
> > Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs8Y2TQ@mail.gmail.com/
> > Signed-off-by: Anand Jain <anand.jain@oracle.com>
> > Tested-by: Alex Romosan <aromosan@gmail.com>
> > Tested-by: CHECK_1234543212345@protonmail.com
>
> Reviewed-by: David Sterba <dsterba@suse.com>
>
> When you commit the patch, please reorder the tags according to
> https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering

So this introduces a regression.

Running fstests:

(...)
btrfs/156 11s ...  16s
btrfs/157 3s ...  0s
btrfs/158 0s ...  2s
btrfs/159 16s ... - output mismatch (see
/home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad)
    --- tests/btrfs/159.out     2020-10-26 15:31:57.061207266 +0000
    +++ /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad
 2024-02-20 12:54:43.386131546 +0000
    @@ -1,8 +1,11 @@
     QA output created by 159
    +mount: /home/fdmanana/btrfs-tests/scratch_1: wrong fs type, bad
option, bad superblock on /dev/mapper/flakey-test, missing codepage or
helper program, or other error.
    +       dmesg(1) may have more information after failed mount system call.
     File digest before power failure:
    -f049865ed45b1991dc9a299b47d51dbf  SCRATCH_MNT/foobar
    +b2e8facfb4795185fadd85707fe78973  SCRATCH_MNT/foobar
    +umount: /home/fdmanana/btrfs-tests/scratch_1: not mounted.
    ...
    (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/159.out
/home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad'  to see
the entire diff)
btrfs/160 4s ...  4s
(...)

The weird thing is it doesn't happen when running btrfs/159 standalone
(even if doing a rmmod btrfs before).

Instead of running all tests, I managed to reproduce it with only:

$ ./check btrfs/14[6-9] btrfs/15[8-9]
FSTYP         -- btrfs
PLATFORM      -- Linux/x86_64 debian0 6.8.0-rc5-btrfs-next-151+ #1 SMP
PREEMPT_DYNAMIC Mon Feb 19 13:38:37 WET 2024
MKFS_OPTIONS  -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1

btrfs/146 1s ...  2s
btrfs/147 0s ...  1s
btrfs/148 2s ...  2s
btrfs/149 1s ...  1s
btrfs/158 1s ...  0s
btrfs/159 20s ... - output mismatch (see
/home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad)
    --- tests/btrfs/159.out 2020-10-26 15:31:57.061207266 +0000
    +++ /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad
2024-02-20 13:51:25.707220763 +0000
    @@ -1,8 +1,11 @@
     QA output created by 159
    +mount: /home/fdmanana/btrfs-tests/scratch_1: wrong fs type, bad
option, bad superblock on /dev/mapper/flakey-test, missing codepage or
helper program, or other error.
    +       dmesg(1) may have more information after failed mount system call.
     File digest before power failure:
    -f049865ed45b1991dc9a299b47d51dbf  SCRATCH_MNT/foobar
    +b2e8facfb4795185fadd85707fe78973  SCRATCH_MNT/foobar
    +umount: /home/fdmanana/btrfs-tests/scratch_1: not mounted.
    ...
    (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/159.out
/home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad'  to see
the entire diff)
Ran: btrfs/146 btrfs/147 btrfs/148 btrfs/149 btrfs/158 btrfs/159

dmesg shows:

[79195.239769] run fstests btrfs/159 at 2024-02-20 14:06:02
[79195.418917] BTRFS: device fsid 45ac9151-7e2d-4dd7-bf75-99967f869f2a
devid 1 transid 3747 /dev/sdb scanned by mount (3413231)
[79195.419515] BTRFS info (device sdb): first mount of filesystem
45ac9151-7e2d-4dd7-bf75-99967f869f2a
[79195.419525] BTRFS info (device sdb): using crc32c (crc32c-intel)
checksum algorithm
[79195.419529] BTRFS info (device sdb): using free-space-tree
[79195.612719] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828
devid 1 transid 6 /dev/sdc scanned by mkfs.btrfs (3413318)
[79195.666279] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828
devid 1 transid 6 /dev/dm-0 scanned by systemd-udevd (3410982)
[79195.695774] BTRFS info (device dm-0): first mount of filesystem
10184d7d-3ca9-43c1-a6f8-70b134cff828
[79195.695786] BTRFS info (device dm-0): using crc32c (crc32c-intel)
checksum algorithm
[79195.695789] BTRFS error (device dm-0): superblock fsid doesn't
match fsid of fs_devices: 10184d7d-3ca9-43c1-a6f8-70b134cff828 !=
628aff33-4122-4d77-b2a9-2e9a90f27520
[79195.696098] BTRFS error (device dm-0): superblock metadata_uuid
doesn't match metadata uuid of fs_devices:
10184d7d-3ca9-43c1-a6f8-70b134cff828 !=
628aff33-4122-4d77-b2a9-2e9a90f27520
[79195.696419] BTRFS error (device dm-0): dev_item UUID does not match
metadata fsid: 628aff33-4122-4d77-b2a9-2e9a90f27520 !=
10184d7d-3ca9-43c1-a6f8-70b134cff828
[79195.696765] BTRFS error (device dm-0): superblock contains fatal errors
[79195.697447] BTRFS error (device dm-0): open_ctree failed

It always happens with this patch applied in for-next, and never
happens with it reverted.

>
David Sterba Feb. 20, 2024, 6:12 p.m. UTC | #3
On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote:
> On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote:
> > On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote:
> > https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering
> 
> So this introduces a regression.
> 
> $ ./check btrfs/14[6-9] btrfs/15[8-9]

Thanks, with this I can reproduce it and have some ideas what could go
wrong.
Anand Jain Feb. 21, 2024, 4:39 p.m. UTC | #4
On 2/20/24 23:42, David Sterba wrote:
> On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote:
>> On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote:
>>> On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote:
>>> https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering
>>
>> So this introduces a regression.
>>
>> $ ./check btrfs/14[6-9] btrfs/15[8-9]
> 
> Thanks, with this I can reproduce it and have some ideas what could go
> wrong.


Thanks indeed.

I see some error during mkfs.

ERROR: /dev/sdb1 is smaller than requested size, expected 1073741824, 
found 766509056

-Anand
Anand Jain Feb. 21, 2024, 5:03 p.m. UTC | #5
On 2/21/24 22:09, Anand Jain wrote:
> On 2/20/24 23:42, David Sterba wrote:
>> On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote:
>>> On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote:
>>>> On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote:
>>>> https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering
>>>
>>> So this introduces a regression.
>>>
>>> $ ./check btrfs/14[6-9] btrfs/15[8-9]
>>
>> Thanks, with this I can reproduce it and have some ideas what could go
>> wrong.
> 
> 
> Thanks indeed.
> 
> I see some error during mkfs.
> 
> ERROR: /dev/sdb1 is smaller than requested size, expected 1073741824, 
> found 766509056

It is reproducing the wrong problem. Please ignore.
David Sterba Feb. 21, 2024, 9:49 p.m. UTC | #6
On Wed, Feb 21, 2024 at 10:09:59PM +0530, Anand Jain wrote:
> On 2/20/24 23:42, David Sterba wrote:
> > On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote:
> >> On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote:
> >>> On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote:
> >>> https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering
> >>
> >> So this introduces a regression.
> >>
> >> $ ./check btrfs/14[6-9] btrfs/15[8-9]
> > 
> > Thanks, with this I can reproduce it and have some ideas what could go
> > wrong.
> 
> Thanks indeed.

I tested the following, it fixes the fsid problems and has passed full
fstests run. The temp-fsid test coverage needs to be done still.

@@ -1388,6 +1388,10 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
        if (ret)
                btrfs_warn(NULL, "lookup bdev failed for path %s: %d",
                           path, ret);
+       if (devt) {
+               printk(KERN_ERR "free stale devt (for path %s)\n", path);
+               btrfs_free_stale_devices(devt, NULL);
+       }
Anand Jain Feb. 22, 2024, 2:45 a.m. UTC | #7
On 2/22/24 03:19, David Sterba wrote:
> On Wed, Feb 21, 2024 at 10:09:59PM +0530, Anand Jain wrote:
>> On 2/20/24 23:42, David Sterba wrote:
>>> On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote:
>>>> On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote:
>>>>> On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote:
>>>>> https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering
>>>>
>>>> So this introduces a regression.
>>>>
>>>> $ ./check btrfs/14[6-9] btrfs/15[8-9]
>>>
>>> Thanks, with this I can reproduce it and have some ideas what could go
>>> wrong.
>>
>> Thanks indeed.
> 
> I tested the following, it fixes the fsid problems and has passed full
> fstests run. The temp-fsid test coverage needs to be done still.
> 
> @@ -1388,6 +1388,10 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>          if (ret)
>                  btrfs_warn(NULL, "lookup bdev failed for path %s: %d",
>                             path, ret);
> +       if (devt) {
> +               printk(KERN_ERR "free stale devt (for path %s)\n", path);
> +               btrfs_free_stale_devices(devt, NULL);
> +       }

Right. I had this in mind to check for the stale devices. I'll do.

Anand
Anand Jain Feb. 22, 2024, 9:07 p.m. UTC | #8
>>>> $ ./check btrfs/14[6-9] btrfs/15[8-9]
>>>>

The stale fsid matches the testcase btrfs/146.
Unfortunately, the failure is inconsistent.
I will take another look tomorrow.

Thanks, Anand


>>>> Thanks, with this I can reproduce it and have some ideas what could go
>>>> wrong.
>>>
>>> Thanks indeed.
>>
>> I tested the following, it fixes the fsid problems and has passed full
>> fstests run. The temp-fsid test coverage needs to be done still.
>>
>> @@ -1388,6 +1388,10 @@ struct btrfs_device 
>> *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>>          if (ret)
>>                  btrfs_warn(NULL, "lookup bdev failed for path %s: %d",
>>                             path, ret);
>> +       if (devt) {
>> +               printk(KERN_ERR "free stale devt (for path %s)\n", path);
>> +               btrfs_free_stale_devices(devt, NULL);
>> +       }
> 
> Right. I had this in mind to check for the stale devices. I'll do.
> 
> Anand
Anand Jain March 7, 2024, 4:13 a.m. UTC | #9
The problem has been highly inconsistent to reproduce. I apologize
for the delay in sending out the fix.

<snap>

> $ ./check btrfs/14[6-9] btrfs/15[8-9]
> FSTYP         -- btrfs
> PLATFORM      -- Linux/x86_64 debian0 6.8.0-rc5-btrfs-next-151+ #1 SMP
> PREEMPT_DYNAMIC Mon Feb 19 13:38:37 WET 2024
> MKFS_OPTIONS  -- /dev/sdc
> MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
> 
> btrfs/146 1s ...  2s
> btrfs/147 0s ...  1s
> btrfs/148 2s ...  2s
> btrfs/149 1s ...  1s
> btrfs/158 1s ...  0s
> btrfs/159 20s ... - output mismatch (see
> /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad)
>      --- tests/btrfs/159.out 2020-10-26 15:31:57.061207266 +0000
>      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad
> 2024-02-20 13:51:25.707220763 +0000
>      @@ -1,8 +1,11 @@
>       QA output created by 159
>      +mount: /home/fdmanana/btrfs-tests/scratch_1: wrong fs type, bad
> option, bad superblock on /dev/mapper/flakey-test, missing codepage or
> helper program, or other error.
>      +       dmesg(1) may have more information after failed mount system call.
>       File digest before power failure:
>      -f049865ed45b1991dc9a299b47d51dbf  SCRATCH_MNT/foobar
>      +b2e8facfb4795185fadd85707fe78973  SCRATCH_MNT/foobar
>      +umount: /home/fdmanana/btrfs-tests/scratch_1: not mounted.
>      ...
>      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/159.out
> /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad'  to see
> the entire diff)
> Ran: btrfs/146 btrfs/147 btrfs/148 btrfs/149 btrfs/158 btrfs/159
> 

<snap>

btrfs/159 does

         _scratch_mkfs -O no-holes -n $((64 * 1024)) >>$seqres.full 2>&1
         _require_metadata_journaling $SCRATCH_DEV
         _init_flakey
         _mount_flakey

> [79195.612719] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828
> devid 1 transid 6 /dev/sdc scanned by mkfs.btrfs (3413318)
> [79195.666279] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828
> devid 1 transid 6 /dev/dm-0 scanned by systemd-udevd (3410982)

Both /dev/sdc and /dev/dm-0 get scanned, and the tempfsid
gets activated wrongly.

The fix is to add another criterion to check if the device is
already mounted; then only let the thread update the device path.
However, I'm not sure if it will fix the original problem
(update-grub). I have sent an RFC patch v3 for verification.

Thanks,  Anand


> [79195.695774] BTRFS info (device dm-0): first mount of filesystem
> 10184d7d-3ca9-43c1-a6f8-70b134cff828
> [79195.695786] BTRFS info (device dm-0): using crc32c (crc32c-intel)
> checksum algorithm
> [79195.695789] BTRFS error (device dm-0): superblock fsid doesn't
> match fsid of fs_devices: 10184d7d-3ca9-43c1-a6f8-70b134cff828 !=
> 628aff33-4122-4d77-b2a9-2e9a90f27520
> [79195.696098] BTRFS error (device dm-0): superblock metadata_uuid
> doesn't match metadata uuid of fs_devices:
> 10184d7d-3ca9-43c1-a6f8-70b134cff828 !=
> 628aff33-4122-4d77-b2a9-2e9a90f27520
> [79195.696419] BTRFS error (device dm-0): dev_item UUID does not match
> metadata fsid: 628aff33-4122-4d77-b2a9-2e9a90f27520 !=
> 10184d7d-3ca9-43c1-a6f8-70b134cff828
> [79195.696765] BTRFS error (device dm-0): superblock contains fatal errors
> [79195.697447] BTRFS error (device dm-0): open_ctree failed
> 
> It always happens with this patch applied in for-next, and never
> happens with it reverted.
> 
>>
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 474ab7ed65ea..192c540a650c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1299,6 +1299,31 @@  int btrfs_forget_devices(dev_t devt)
 	return ret;
 }
 
+static bool btrfs_skip_registration(struct btrfs_super_block *disk_super,
+				    dev_t devt, bool mount_arg_dev)
+{
+	struct btrfs_fs_devices *fs_devices;
+
+	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+		struct btrfs_device *device;
+
+		mutex_lock(&fs_devices->device_list_mutex);
+		list_for_each_entry(device, &fs_devices->devices, dev_list) {
+			if (device->devt == devt) {
+				mutex_unlock(&fs_devices->device_list_mutex);
+				return false;
+			}
+		}
+		mutex_unlock(&fs_devices->device_list_mutex);
+	}
+
+	if (!mount_arg_dev && btrfs_super_num_devices(disk_super) == 1 &&
+	    !(btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING))
+		return true;
+
+	return false;
+}
+
 /*
  * Look for a btrfs signature on a device. This may be called out of the mount path
  * and we are not allowed to call set_blocksize during the scan. The superblock
@@ -1316,6 +1341,7 @@  struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
 	struct btrfs_device *device = NULL;
 	struct bdev_handle *bdev_handle;
 	u64 bytenr, bytenr_orig;
+	dev_t devt = 0;
 	int ret;
 
 	lockdep_assert_held(&uuid_mutex);
@@ -1355,18 +1381,16 @@  struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
 		goto error_bdev_put;
 	}
 
-	if (!mount_arg_dev && btrfs_super_num_devices(disk_super) == 1 &&
-	    !(btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING)) {
-		dev_t devt;
+	ret = lookup_bdev(path, &devt);
+	if (ret)
+		btrfs_warn(NULL, "lookup bdev failed for path %s: %d",
+			   path, ret);
 
-		ret = lookup_bdev(path, &devt);
-		if (ret)
-			btrfs_warn(NULL, "lookup bdev failed for path %s: %d",
-				   path, ret);
-		else
+	if (btrfs_skip_registration(disk_super, devt, mount_arg_dev)) {
+		pr_debug("BTRFS: skip registering single non-seed device %s\n",
+			  path);
+		if (devt)
 			btrfs_free_stale_devices(devt, NULL);
-
-		pr_debug("BTRFS: skip registering single non-seed device %s\n", path);
 		device = NULL;
 		goto free_disk_super;
 	}