Message ID | 88673c60b1d866c289ef019945647adfc8ab51d0.1707781507.git.anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: do not skip re-registration for the mounted device | expand |
On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote: > There are reports that since version 6.7 update-grub fails to find the > device of the root on systems without initrd and on a single device. > > This looks like the device name changed in the output of > /proc/self/mountinfo: > > 6.5-rc5 working > > 18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ... > > 6.7 not working: > > 17 1 0:15 / / rw,noatime - btrfs /dev/root ... > > and "update-grub" shows this error: > > /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?) > > This looks like it's related to the device name, but grub-probe > recognizes the "/dev/root" path and tries to find the underlying device. > However there's a special case for some filesystems, for btrfs in > particular. > > The generic root device detection heuristic is not done and it all > relies on reading the device infos by a btrfs specific ioctl. This ioctl > returns the device name as it was saved at the time of device scan (in > this case it's /dev/root). > > The change in 6.7 for temp_fsid to allow several single device > filesystem to exist with the same fsid (and transparently generate a new > UUID at mount time) was to skip caching/registering such devices. > > This also skipped mounted device. One step of scanning is to check if > the device name hasn't changed, and if yes then update the cached value. > > This broke the grub-probe as it always read the device /dev/root and > couldn't find it in the system. A temporary workaround is to create a > symlink but this does not survive reboot. > > The right fix is to allow updating the device path of a mounted > filesystem even if this is a single device one. > > In the fix, check if the device's major:minor number matches with the > cached device. If they do, then we can allow the scan to happen so that > device_list_add() can take care of updating the device path. The file > descriptor remains unchanged. > > This does not affect the temp_fsid feature, the UUID of the mounted > filesystem remains the same and the matching is based on device major:minor > which is unique per mounted filesystem. > > This covers the path when the device (that exists for all mounted > devices) name changes, updating /dev/root to /dev/sdx. Any other single > device with filesystem and is not mounted is still skipped. > > Note that if a system is booted and initial mount is done on the > /dev/root device, this will be the cached name of the device. Only after > the command "btrfs device scan" it will change as it triggers the > rename. > > The fix was verified by users whose systems were affected. > > CC: stable@vger.kernel.org # 6.7+ > Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem") > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353 > Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs8Y2TQ@mail.gmail.com/ > Signed-off-by: Anand Jain <anand.jain@oracle.com> > Tested-by: Alex Romosan <aromosan@gmail.com> > Tested-by: CHECK_1234543212345@protonmail.com Reviewed-by: David Sterba <dsterba@suse.com> When you commit the patch, please reorder the tags according to https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering
On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote: > > On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote: > > There are reports that since version 6.7 update-grub fails to find the > > device of the root on systems without initrd and on a single device. > > > > This looks like the device name changed in the output of > > /proc/self/mountinfo: > > > > 6.5-rc5 working > > > > 18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ... > > > > 6.7 not working: > > > > 17 1 0:15 / / rw,noatime - btrfs /dev/root ... > > > > and "update-grub" shows this error: > > > > /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?) > > > > This looks like it's related to the device name, but grub-probe > > recognizes the "/dev/root" path and tries to find the underlying device. > > However there's a special case for some filesystems, for btrfs in > > particular. > > > > The generic root device detection heuristic is not done and it all > > relies on reading the device infos by a btrfs specific ioctl. This ioctl > > returns the device name as it was saved at the time of device scan (in > > this case it's /dev/root). > > > > The change in 6.7 for temp_fsid to allow several single device > > filesystem to exist with the same fsid (and transparently generate a new > > UUID at mount time) was to skip caching/registering such devices. > > > > This also skipped mounted device. One step of scanning is to check if > > the device name hasn't changed, and if yes then update the cached value. > > > > This broke the grub-probe as it always read the device /dev/root and > > couldn't find it in the system. A temporary workaround is to create a > > symlink but this does not survive reboot. > > > > The right fix is to allow updating the device path of a mounted > > filesystem even if this is a single device one. > > > > In the fix, check if the device's major:minor number matches with the > > cached device. If they do, then we can allow the scan to happen so that > > device_list_add() can take care of updating the device path. The file > > descriptor remains unchanged. > > > > This does not affect the temp_fsid feature, the UUID of the mounted > > filesystem remains the same and the matching is based on device major:minor > > which is unique per mounted filesystem. > > > > This covers the path when the device (that exists for all mounted > > devices) name changes, updating /dev/root to /dev/sdx. Any other single > > device with filesystem and is not mounted is still skipped. > > > > Note that if a system is booted and initial mount is done on the > > /dev/root device, this will be the cached name of the device. Only after > > the command "btrfs device scan" it will change as it triggers the > > rename. > > > > The fix was verified by users whose systems were affected. > > > > CC: stable@vger.kernel.org # 6.7+ > > Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem") > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353 > > Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs8Y2TQ@mail.gmail.com/ > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > > Tested-by: Alex Romosan <aromosan@gmail.com> > > Tested-by: CHECK_1234543212345@protonmail.com > > Reviewed-by: David Sterba <dsterba@suse.com> > > When you commit the patch, please reorder the tags according to > https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering So this introduces a regression. Running fstests: (...) btrfs/156 11s ... 16s btrfs/157 3s ... 0s btrfs/158 0s ... 2s btrfs/159 16s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad) --- tests/btrfs/159.out 2020-10-26 15:31:57.061207266 +0000 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad 2024-02-20 12:54:43.386131546 +0000 @@ -1,8 +1,11 @@ QA output created by 159 +mount: /home/fdmanana/btrfs-tests/scratch_1: wrong fs type, bad option, bad superblock on /dev/mapper/flakey-test, missing codepage or helper program, or other error. + dmesg(1) may have more information after failed mount system call. File digest before power failure: -f049865ed45b1991dc9a299b47d51dbf SCRATCH_MNT/foobar +b2e8facfb4795185fadd85707fe78973 SCRATCH_MNT/foobar +umount: /home/fdmanana/btrfs-tests/scratch_1: not mounted. ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/159.out /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad' to see the entire diff) btrfs/160 4s ... 4s (...) The weird thing is it doesn't happen when running btrfs/159 standalone (even if doing a rmmod btrfs before). Instead of running all tests, I managed to reproduce it with only: $ ./check btrfs/14[6-9] btrfs/15[8-9] FSTYP -- btrfs PLATFORM -- Linux/x86_64 debian0 6.8.0-rc5-btrfs-next-151+ #1 SMP PREEMPT_DYNAMIC Mon Feb 19 13:38:37 WET 2024 MKFS_OPTIONS -- /dev/sdc MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 btrfs/146 1s ... 2s btrfs/147 0s ... 1s btrfs/148 2s ... 2s btrfs/149 1s ... 1s btrfs/158 1s ... 0s btrfs/159 20s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad) --- tests/btrfs/159.out 2020-10-26 15:31:57.061207266 +0000 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad 2024-02-20 13:51:25.707220763 +0000 @@ -1,8 +1,11 @@ QA output created by 159 +mount: /home/fdmanana/btrfs-tests/scratch_1: wrong fs type, bad option, bad superblock on /dev/mapper/flakey-test, missing codepage or helper program, or other error. + dmesg(1) may have more information after failed mount system call. File digest before power failure: -f049865ed45b1991dc9a299b47d51dbf SCRATCH_MNT/foobar +b2e8facfb4795185fadd85707fe78973 SCRATCH_MNT/foobar +umount: /home/fdmanana/btrfs-tests/scratch_1: not mounted. ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/159.out /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad' to see the entire diff) Ran: btrfs/146 btrfs/147 btrfs/148 btrfs/149 btrfs/158 btrfs/159 dmesg shows: [79195.239769] run fstests btrfs/159 at 2024-02-20 14:06:02 [79195.418917] BTRFS: device fsid 45ac9151-7e2d-4dd7-bf75-99967f869f2a devid 1 transid 3747 /dev/sdb scanned by mount (3413231) [79195.419515] BTRFS info (device sdb): first mount of filesystem 45ac9151-7e2d-4dd7-bf75-99967f869f2a [79195.419525] BTRFS info (device sdb): using crc32c (crc32c-intel) checksum algorithm [79195.419529] BTRFS info (device sdb): using free-space-tree [79195.612719] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828 devid 1 transid 6 /dev/sdc scanned by mkfs.btrfs (3413318) [79195.666279] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828 devid 1 transid 6 /dev/dm-0 scanned by systemd-udevd (3410982) [79195.695774] BTRFS info (device dm-0): first mount of filesystem 10184d7d-3ca9-43c1-a6f8-70b134cff828 [79195.695786] BTRFS info (device dm-0): using crc32c (crc32c-intel) checksum algorithm [79195.695789] BTRFS error (device dm-0): superblock fsid doesn't match fsid of fs_devices: 10184d7d-3ca9-43c1-a6f8-70b134cff828 != 628aff33-4122-4d77-b2a9-2e9a90f27520 [79195.696098] BTRFS error (device dm-0): superblock metadata_uuid doesn't match metadata uuid of fs_devices: 10184d7d-3ca9-43c1-a6f8-70b134cff828 != 628aff33-4122-4d77-b2a9-2e9a90f27520 [79195.696419] BTRFS error (device dm-0): dev_item UUID does not match metadata fsid: 628aff33-4122-4d77-b2a9-2e9a90f27520 != 10184d7d-3ca9-43c1-a6f8-70b134cff828 [79195.696765] BTRFS error (device dm-0): superblock contains fatal errors [79195.697447] BTRFS error (device dm-0): open_ctree failed It always happens with this patch applied in for-next, and never happens with it reverted. >
On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote: > On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote: > > On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote: > > https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering > > So this introduces a regression. > > $ ./check btrfs/14[6-9] btrfs/15[8-9] Thanks, with this I can reproduce it and have some ideas what could go wrong.
On 2/20/24 23:42, David Sterba wrote: > On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote: >> On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote: >>> On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote: >>> https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering >> >> So this introduces a regression. >> >> $ ./check btrfs/14[6-9] btrfs/15[8-9] > > Thanks, with this I can reproduce it and have some ideas what could go > wrong. Thanks indeed. I see some error during mkfs. ERROR: /dev/sdb1 is smaller than requested size, expected 1073741824, found 766509056 -Anand
On 2/21/24 22:09, Anand Jain wrote: > On 2/20/24 23:42, David Sterba wrote: >> On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote: >>> On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote: >>>> On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote: >>>> https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering >>> >>> So this introduces a regression. >>> >>> $ ./check btrfs/14[6-9] btrfs/15[8-9] >> >> Thanks, with this I can reproduce it and have some ideas what could go >> wrong. > > > Thanks indeed. > > I see some error during mkfs. > > ERROR: /dev/sdb1 is smaller than requested size, expected 1073741824, > found 766509056 It is reproducing the wrong problem. Please ignore.
On Wed, Feb 21, 2024 at 10:09:59PM +0530, Anand Jain wrote: > On 2/20/24 23:42, David Sterba wrote: > > On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote: > >> On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote: > >>> On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote: > >>> https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering > >> > >> So this introduces a regression. > >> > >> $ ./check btrfs/14[6-9] btrfs/15[8-9] > > > > Thanks, with this I can reproduce it and have some ideas what could go > > wrong. > > Thanks indeed. I tested the following, it fixes the fsid problems and has passed full fstests run. The temp-fsid test coverage needs to be done still. @@ -1388,6 +1388,10 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags, if (ret) btrfs_warn(NULL, "lookup bdev failed for path %s: %d", path, ret); + if (devt) { + printk(KERN_ERR "free stale devt (for path %s)\n", path); + btrfs_free_stale_devices(devt, NULL); + }
On 2/22/24 03:19, David Sterba wrote: > On Wed, Feb 21, 2024 at 10:09:59PM +0530, Anand Jain wrote: >> On 2/20/24 23:42, David Sterba wrote: >>> On Tue, Feb 20, 2024 at 02:08:00PM +0000, Filipe Manana wrote: >>>> On Wed, Feb 14, 2024 at 7:17 AM David Sterba <dsterba@suse.cz> wrote: >>>>> On Tue, Feb 13, 2024 at 09:13:56AM +0800, Anand Jain wrote: >>>>> https://btrfs.readthedocs.io/en/latest/dev/Developer-s-FAQ.html#ordering >>>> >>>> So this introduces a regression. >>>> >>>> $ ./check btrfs/14[6-9] btrfs/15[8-9] >>> >>> Thanks, with this I can reproduce it and have some ideas what could go >>> wrong. >> >> Thanks indeed. > > I tested the following, it fixes the fsid problems and has passed full > fstests run. The temp-fsid test coverage needs to be done still. > > @@ -1388,6 +1388,10 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags, > if (ret) > btrfs_warn(NULL, "lookup bdev failed for path %s: %d", > path, ret); > + if (devt) { > + printk(KERN_ERR "free stale devt (for path %s)\n", path); > + btrfs_free_stale_devices(devt, NULL); > + } Right. I had this in mind to check for the stale devices. I'll do. Anand
>>>> $ ./check btrfs/14[6-9] btrfs/15[8-9] >>>> The stale fsid matches the testcase btrfs/146. Unfortunately, the failure is inconsistent. I will take another look tomorrow. Thanks, Anand >>>> Thanks, with this I can reproduce it and have some ideas what could go >>>> wrong. >>> >>> Thanks indeed. >> >> I tested the following, it fixes the fsid problems and has passed full >> fstests run. The temp-fsid test coverage needs to be done still. >> >> @@ -1388,6 +1388,10 @@ struct btrfs_device >> *btrfs_scan_one_device(const char *path, blk_mode_t flags, >> if (ret) >> btrfs_warn(NULL, "lookup bdev failed for path %s: %d", >> path, ret); >> + if (devt) { >> + printk(KERN_ERR "free stale devt (for path %s)\n", path); >> + btrfs_free_stale_devices(devt, NULL); >> + } > > Right. I had this in mind to check for the stale devices. I'll do. > > Anand
The problem has been highly inconsistent to reproduce. I apologize for the delay in sending out the fix. <snap> > $ ./check btrfs/14[6-9] btrfs/15[8-9] > FSTYP -- btrfs > PLATFORM -- Linux/x86_64 debian0 6.8.0-rc5-btrfs-next-151+ #1 SMP > PREEMPT_DYNAMIC Mon Feb 19 13:38:37 WET 2024 > MKFS_OPTIONS -- /dev/sdc > MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 > > btrfs/146 1s ... 2s > btrfs/147 0s ... 1s > btrfs/148 2s ... 2s > btrfs/149 1s ... 1s > btrfs/158 1s ... 0s > btrfs/159 20s ... - output mismatch (see > /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad) > --- tests/btrfs/159.out 2020-10-26 15:31:57.061207266 +0000 > +++ /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad > 2024-02-20 13:51:25.707220763 +0000 > @@ -1,8 +1,11 @@ > QA output created by 159 > +mount: /home/fdmanana/btrfs-tests/scratch_1: wrong fs type, bad > option, bad superblock on /dev/mapper/flakey-test, missing codepage or > helper program, or other error. > + dmesg(1) may have more information after failed mount system call. > File digest before power failure: > -f049865ed45b1991dc9a299b47d51dbf SCRATCH_MNT/foobar > +b2e8facfb4795185fadd85707fe78973 SCRATCH_MNT/foobar > +umount: /home/fdmanana/btrfs-tests/scratch_1: not mounted. > ... > (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/159.out > /home/fdmanana/git/hub/xfstests/results//btrfs/159.out.bad' to see > the entire diff) > Ran: btrfs/146 btrfs/147 btrfs/148 btrfs/149 btrfs/158 btrfs/159 > <snap> btrfs/159 does _scratch_mkfs -O no-holes -n $((64 * 1024)) >>$seqres.full 2>&1 _require_metadata_journaling $SCRATCH_DEV _init_flakey _mount_flakey > [79195.612719] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828 > devid 1 transid 6 /dev/sdc scanned by mkfs.btrfs (3413318) > [79195.666279] BTRFS: device fsid 10184d7d-3ca9-43c1-a6f8-70b134cff828 > devid 1 transid 6 /dev/dm-0 scanned by systemd-udevd (3410982) Both /dev/sdc and /dev/dm-0 get scanned, and the tempfsid gets activated wrongly. The fix is to add another criterion to check if the device is already mounted; then only let the thread update the device path. However, I'm not sure if it will fix the original problem (update-grub). I have sent an RFC patch v3 for verification. Thanks, Anand > [79195.695774] BTRFS info (device dm-0): first mount of filesystem > 10184d7d-3ca9-43c1-a6f8-70b134cff828 > [79195.695786] BTRFS info (device dm-0): using crc32c (crc32c-intel) > checksum algorithm > [79195.695789] BTRFS error (device dm-0): superblock fsid doesn't > match fsid of fs_devices: 10184d7d-3ca9-43c1-a6f8-70b134cff828 != > 628aff33-4122-4d77-b2a9-2e9a90f27520 > [79195.696098] BTRFS error (device dm-0): superblock metadata_uuid > doesn't match metadata uuid of fs_devices: > 10184d7d-3ca9-43c1-a6f8-70b134cff828 != > 628aff33-4122-4d77-b2a9-2e9a90f27520 > [79195.696419] BTRFS error (device dm-0): dev_item UUID does not match > metadata fsid: 628aff33-4122-4d77-b2a9-2e9a90f27520 != > 10184d7d-3ca9-43c1-a6f8-70b134cff828 > [79195.696765] BTRFS error (device dm-0): superblock contains fatal errors > [79195.697447] BTRFS error (device dm-0): open_ctree failed > > It always happens with this patch applied in for-next, and never > happens with it reverted. > >>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 474ab7ed65ea..192c540a650c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1299,6 +1299,31 @@ int btrfs_forget_devices(dev_t devt) return ret; } +static bool btrfs_skip_registration(struct btrfs_super_block *disk_super, + dev_t devt, bool mount_arg_dev) +{ + struct btrfs_fs_devices *fs_devices; + + list_for_each_entry(fs_devices, &fs_uuids, fs_list) { + struct btrfs_device *device; + + mutex_lock(&fs_devices->device_list_mutex); + list_for_each_entry(device, &fs_devices->devices, dev_list) { + if (device->devt == devt) { + mutex_unlock(&fs_devices->device_list_mutex); + return false; + } + } + mutex_unlock(&fs_devices->device_list_mutex); + } + + if (!mount_arg_dev && btrfs_super_num_devices(disk_super) == 1 && + !(btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING)) + return true; + + return false; +} + /* * Look for a btrfs signature on a device. This may be called out of the mount path * and we are not allowed to call set_blocksize during the scan. The superblock @@ -1316,6 +1341,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags, struct btrfs_device *device = NULL; struct bdev_handle *bdev_handle; u64 bytenr, bytenr_orig; + dev_t devt = 0; int ret; lockdep_assert_held(&uuid_mutex); @@ -1355,18 +1381,16 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags, goto error_bdev_put; } - if (!mount_arg_dev && btrfs_super_num_devices(disk_super) == 1 && - !(btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING)) { - dev_t devt; + ret = lookup_bdev(path, &devt); + if (ret) + btrfs_warn(NULL, "lookup bdev failed for path %s: %d", + path, ret); - ret = lookup_bdev(path, &devt); - if (ret) - btrfs_warn(NULL, "lookup bdev failed for path %s: %d", - path, ret); - else + if (btrfs_skip_registration(disk_super, devt, mount_arg_dev)) { + pr_debug("BTRFS: skip registering single non-seed device %s\n", + path); + if (devt) btrfs_free_stale_devices(devt, NULL); - - pr_debug("BTRFS: skip registering single non-seed device %s\n", path); device = NULL; goto free_disk_super; }