Message ID | 02d59bdd7a8b778deb17e300354558498db59376.1692178060.git.anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: reject device with CHANGING_FSID_V2 | expand |
On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote: > The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state > where the device in the userspace btrfstune -m|-M operation failed to > complete changing the fsid. > > This flag makes the kernel to automatically determine the other > partner devices to which a given device can be associated, based on the > fsid, metadata_uuid and generation values. > > btrfstune -m|M feature is especially useful in virtual cloud setups, where > compute instances (disk images) are quickly copied, fsid changed, and > launched. Given numerous disk images with the same metadata_uuid but > different fsid, there's no clear way a device can be correctly assembled > with the proper partners when the CHANGING_FSID_V2 flag is set. So, the > disk could be assembled incorrectly, as in the example below: > > Before this patch: > > Consider the following two filesystems: > /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m > operation fails. > > In this scenario, as the /dev/loop0's fsid change is interrupted, and the > CHANGING_FSID_V2 flag is set as shown below. > > $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags" > > $ btrfs inspect dump-super /dev/loop0 | egrep '$p' > superblock: bytenr=65536, device=/dev/loop0 > flags 0x1000000001 > fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 > metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b > generation 9 > num_devices 2 > incompat_flags 0x741 > dev_item.devid 1 > > $ btrfs inspect dump-super /dev/loop1 | egrep '$p' > superblock: bytenr=65536, device=/dev/loop1 > flags 0x1 > fsid 11d2af4d-1b71-45a9-83f6-f2100766939d > metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b > generation 10 > num_devices 2 > incompat_flags 0x741 > dev_item.devid 2 > > $ btrfs inspect dump-super /dev/loop2 | egrep '$p' > superblock: bytenr=65536, device=/dev/loop2 > flags 0x1 > fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 > metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b > generation 8 > num_devices 2 > incompat_flags 0x741 > dev_item.devid 1 > > $ btrfs inspect dump-super /dev/loop3 | egrep '$p' > superblock: bytenr=65536, device=/dev/loop3 > flags 0x1 > fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 > metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b > generation 8 > num_devices 2 > incompat_flags 0x741 > dev_item.devid 2 > > > It is normal that some devices aren't instantly discovered during > system boot or iSCSI discovery. The controlled scan below demonstrates > this. > > $ btrfs device scan --forget > $ btrfs device scan /dev/loop0 > Scanning for btrfs filesystems on '/dev/loop0' > $ mount /dev/loop3 /btrfs > $ btrfs filesystem show -m > Label: none uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 > Total devices 2 FS bytes used 144.00KiB > devid 1 size 300.00MiB used 48.00MiB path /dev/loop0 > devid 2 size 300.00MiB used 40.00MiB path /dev/loop3 > > /dev/loop0 and /dev/loop3 are incorrectly partnered. > > This kernel patch removes functions and code connected to the > CHANGING_FSID_V2 flag. I didn't have a closer look but it seems you're removing all the logic to make the metadata uuid robust and usable in case of interrupted conversion, while finding another case where it does not work as you expect. With this it would be change in behaviour, we need to check the original use case. IIRC as the metadata uuid change is lightweight we want to try harder to deal with the easy errors instead of rejecting the filesystem mount.
On 17/8/23 20:04, David Sterba wrote: > On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote: >> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state >> where the device in the userspace btrfstune -m|-M operation failed to >> complete changing the fsid. >> >> This flag makes the kernel to automatically determine the other >> partner devices to which a given device can be associated, based on the >> fsid, metadata_uuid and generation values. >> >> btrfstune -m|M feature is especially useful in virtual cloud setups, where >> compute instances (disk images) are quickly copied, fsid changed, and >> launched. Given numerous disk images with the same metadata_uuid but >> different fsid, there's no clear way a device can be correctly assembled >> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the >> disk could be assembled incorrectly, as in the example below: >> >> Before this patch: >> >> Consider the following two filesystems: >> /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m >> operation fails. >> >> In this scenario, as the /dev/loop0's fsid change is interrupted, and the >> CHANGING_FSID_V2 flag is set as shown below. >> >> $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags" >> >> $ btrfs inspect dump-super /dev/loop0 | egrep '$p' >> superblock: bytenr=65536, device=/dev/loop0 >> flags 0x1000000001 >> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >> generation 9 >> num_devices 2 >> incompat_flags 0x741 >> dev_item.devid 1 >> >> $ btrfs inspect dump-super /dev/loop1 | egrep '$p' >> superblock: bytenr=65536, device=/dev/loop1 >> flags 0x1 >> fsid 11d2af4d-1b71-45a9-83f6-f2100766939d >> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >> generation 10 >> num_devices 2 >> incompat_flags 0x741 >> dev_item.devid 2 >> >> $ btrfs inspect dump-super /dev/loop2 | egrep '$p' >> superblock: bytenr=65536, device=/dev/loop2 >> flags 0x1 >> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >> generation 8 >> num_devices 2 >> incompat_flags 0x741 >> dev_item.devid 1 >> >> $ btrfs inspect dump-super /dev/loop3 | egrep '$p' >> superblock: bytenr=65536, device=/dev/loop3 >> flags 0x1 >> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >> generation 8 >> num_devices 2 >> incompat_flags 0x741 >> dev_item.devid 2 >> >> >> It is normal that some devices aren't instantly discovered during >> system boot or iSCSI discovery. The controlled scan below demonstrates >> this. >> >> $ btrfs device scan --forget >> $ btrfs device scan /dev/loop0 >> Scanning for btrfs filesystems on '/dev/loop0' >> $ mount /dev/loop3 /btrfs >> $ btrfs filesystem show -m >> Label: none uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >> Total devices 2 FS bytes used 144.00KiB >> devid 1 size 300.00MiB used 48.00MiB path /dev/loop0 >> devid 2 size 300.00MiB used 40.00MiB path /dev/loop3 >> >> /dev/loop0 and /dev/loop3 are incorrectly partnered. >> >> This kernel patch removes functions and code connected to the >> CHANGING_FSID_V2 flag. > > I didn't have a closer look but it seems you're removing all the logic > to make the metadata uuid robust and usable in case of interrupted > conversion, while finding another case where it does not work as you > expect. With this it would be change in behaviour, we need to check > the original use case. IIRC as the metadata uuid change is lightweight > we want to try harder to deal with the easy errors instead of rejecting > the filesystem mount. Robust indeed. Silently assembling wrong devices-data loss risk? Failing to assemble is still safe. I think it is better to introduce a sub-command to clone btrfs filesystem with a new device-uuid and same fsid (as it looks like same fsid has some use case). Thanks, Anand
On 2023/8/18 01:19, Anand Jain wrote: > > > On 17/8/23 20:04, David Sterba wrote: >> On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote: >>> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state >>> where the device in the userspace btrfstune -m|-M operation failed to >>> complete changing the fsid. >>> >>> This flag makes the kernel to automatically determine the other >>> partner devices to which a given device can be associated, based on the >>> fsid, metadata_uuid and generation values. >>> >>> btrfstune -m|M feature is especially useful in virtual cloud setups, >>> where >>> compute instances (disk images) are quickly copied, fsid changed, and >>> launched. Given numerous disk images with the same metadata_uuid but >>> different fsid, there's no clear way a device can be correctly assembled >>> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the >>> disk could be assembled incorrectly, as in the example below: >>> >>> Before this patch: >>> >>> Consider the following two filesystems: >>> /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m >>> operation fails. >>> >>> In this scenario, as the /dev/loop0's fsid change is interrupted, and >>> the >>> CHANGING_FSID_V2 flag is set as shown below. >>> >>> $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags" >>> >>> $ btrfs inspect dump-super /dev/loop0 | egrep '$p' >>> superblock: bytenr=65536, device=/dev/loop0 >>> flags 0x1000000001 >>> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >>> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >>> generation 9 >>> num_devices 2 >>> incompat_flags 0x741 >>> dev_item.devid 1 >>> >>> $ btrfs inspect dump-super /dev/loop1 | egrep '$p' >>> superblock: bytenr=65536, device=/dev/loop1 >>> flags 0x1 >>> fsid 11d2af4d-1b71-45a9-83f6-f2100766939d >>> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >>> generation 10 >>> num_devices 2 >>> incompat_flags 0x741 >>> dev_item.devid 2 >>> >>> $ btrfs inspect dump-super /dev/loop2 | egrep '$p' >>> superblock: bytenr=65536, device=/dev/loop2 >>> flags 0x1 >>> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >>> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >>> generation 8 >>> num_devices 2 >>> incompat_flags 0x741 >>> dev_item.devid 1 >>> >>> $ btrfs inspect dump-super /dev/loop3 | egrep '$p' >>> superblock: bytenr=65536, device=/dev/loop3 >>> flags 0x1 >>> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >>> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >>> generation 8 >>> num_devices 2 >>> incompat_flags 0x741 >>> dev_item.devid 2 >>> >>> >>> It is normal that some devices aren't instantly discovered during >>> system boot or iSCSI discovery. The controlled scan below demonstrates >>> this. >>> >>> $ btrfs device scan --forget >>> $ btrfs device scan /dev/loop0 >>> Scanning for btrfs filesystems on '/dev/loop0' >>> $ mount /dev/loop3 /btrfs >>> $ btrfs filesystem show -m >>> Label: none uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >>> Total devices 2 FS bytes used 144.00KiB >>> devid 1 size 300.00MiB used 48.00MiB path /dev/loop0 >>> devid 2 size 300.00MiB used 40.00MiB path /dev/loop3 >>> >>> /dev/loop0 and /dev/loop3 are incorrectly partnered. >>> >>> This kernel patch removes functions and code connected to the >>> CHANGING_FSID_V2 flag. >> >> I didn't have a closer look but it seems you're removing all the logic >> to make the metadata uuid robust and usable in case of interrupted >> conversion, while finding another case where it does not work as you >> expect. With this it would be change in behaviour, we need to check >> the original use case. IIRC as the metadata uuid change is lightweight >> we want to try harder to deal with the easy errors instead of rejecting >> the filesystem mount. > > Robust indeed. Silently assembling wrong devices-data loss risk? > Failing to assemble is still safe. > > I think it is better to introduce a sub-command to clone btrfs > filesystem with a new device-uuid and same fsid (as it looks like > same fsid has some use case). > > Thanks, Anand Oh, my memory comes back, the original design for the two stage commitment is to avoid split brain cases when one device is committed with the new flag, while the remaining one doesn't. With the extra stage, even if at stage 1 or 2 the transaction is interrupted and only one device got the new flag, it can help us to locate the stage and recover. Thanks, Qu
On 18/08/2023 08:21, Qu Wenruo wrote: > > > On 2023/8/18 01:19, Anand Jain wrote: >> >> >> On 17/8/23 20:04, David Sterba wrote: >>> On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote: >>>> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state >>>> where the device in the userspace btrfstune -m|-M operation failed to >>>> complete changing the fsid. >>>> >>>> This flag makes the kernel to automatically determine the other >>>> partner devices to which a given device can be associated, based on the >>>> fsid, metadata_uuid and generation values. >>>> >>>> btrfstune -m|M feature is especially useful in virtual cloud setups, >>>> where >>>> compute instances (disk images) are quickly copied, fsid changed, and >>>> launched. Given numerous disk images with the same metadata_uuid but >>>> different fsid, there's no clear way a device can be correctly >>>> assembled >>>> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the >>>> disk could be assembled incorrectly, as in the example below: >>>> >>>> Before this patch: >>>> >>>> Consider the following two filesystems: >>>> /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the >>>> btrsftune -m >>>> operation fails. >>>> >>>> In this scenario, as the /dev/loop0's fsid change is interrupted, >>>> and the >>>> CHANGING_FSID_V2 flag is set as shown below. >>>> >>>> $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags" >>>> >>>> $ btrfs inspect dump-super /dev/loop0 | egrep '$p' >>>> superblock: bytenr=65536, device=/dev/loop0 >>>> flags 0x1000000001 >>>> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >>>> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >>>> generation 9 >>>> num_devices 2 >>>> incompat_flags 0x741 >>>> dev_item.devid 1 >>>> >>>> $ btrfs inspect dump-super /dev/loop1 | egrep '$p' >>>> superblock: bytenr=65536, device=/dev/loop1 >>>> flags 0x1 >>>> fsid 11d2af4d-1b71-45a9-83f6-f2100766939d >>>> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >>>> generation 10 >>>> num_devices 2 >>>> incompat_flags 0x741 >>>> dev_item.devid 2 >>>> >>>> $ btrfs inspect dump-super /dev/loop2 | egrep '$p' >>>> superblock: bytenr=65536, device=/dev/loop2 >>>> flags 0x1 >>>> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >>>> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >>>> generation 8 >>>> num_devices 2 >>>> incompat_flags 0x741 >>>> dev_item.devid 1 >>>> >>>> $ btrfs inspect dump-super /dev/loop3 | egrep '$p' >>>> superblock: bytenr=65536, device=/dev/loop3 >>>> flags 0x1 >>>> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >>>> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >>>> generation 8 >>>> num_devices 2 >>>> incompat_flags 0x741 >>>> dev_item.devid 2 >>>> >>>> >>>> It is normal that some devices aren't instantly discovered during >>>> system boot or iSCSI discovery. The controlled scan below demonstrates >>>> this. >>>> >>>> $ btrfs device scan --forget >>>> $ btrfs device scan /dev/loop0 >>>> Scanning for btrfs filesystems on '/dev/loop0' >>>> $ mount /dev/loop3 /btrfs >>>> $ btrfs filesystem show -m >>>> Label: none uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >>>> Total devices 2 FS bytes used 144.00KiB >>>> devid 1 size 300.00MiB used 48.00MiB path /dev/loop0 >>>> devid 2 size 300.00MiB used 40.00MiB path /dev/loop3 >>>> >>>> /dev/loop0 and /dev/loop3 are incorrectly partnered. >>>> >>>> This kernel patch removes functions and code connected to the >>>> CHANGING_FSID_V2 flag. >>> >>> I didn't have a closer look but it seems you're removing all the logic >>> to make the metadata uuid robust and usable in case of interrupted >>> conversion, while finding another case where it does not work as you >>> expect. With this it would be change in behaviour, we need to check >>> the original use case. IIRC as the metadata uuid change is lightweight >>> we want to try harder to deal with the easy errors instead of rejecting >>> the filesystem mount. >> >> Robust indeed. Silently assembling wrong devices-data loss risk? >> Failing to assemble is still safe. >> >> I think it is better to introduce a sub-command to clone btrfs >> filesystem with a new device-uuid and same fsid (as it looks like >> same fsid has some use case). >> >> Thanks, Anand > > Oh, my memory comes back, the original design for the two stage > commitment is to avoid split brain cases when one device is committed > with the new flag, while the remaining one doesn't. > > With the extra stage, even if at stage 1 or 2 the transaction is > interrupted and only one device got the new flag, it can help us to > locate the stage and recover. As this comment is about the btrfstune patch [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit Let's discuss it there. Thanks.
On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote: > The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state > where the device in the userspace btrfstune -m|-M operation failed to > complete changing the fsid. > > This flag makes the kernel to automatically determine the other > partner devices to which a given device can be associated, based on the > fsid, metadata_uuid and generation values. > > btrfstune -m|M feature is especially useful in virtual cloud setups, where > compute instances (disk images) are quickly copied, fsid changed, and > launched. Given numerous disk images with the same metadata_uuid but > different fsid, there's no clear way a device can be correctly assembled > with the proper partners when the CHANGING_FSID_V2 flag is set. So, the > disk could be assembled incorrectly, as in the example below: > > Before this patch: > > Consider the following two filesystems: > /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m > operation fails. > > In this scenario, as the /dev/loop0's fsid change is interrupted, and the > CHANGING_FSID_V2 flag is set as shown below. > > $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags" > > $ btrfs inspect dump-super /dev/loop0 | egrep '$p' > superblock: bytenr=65536, device=/dev/loop0 > flags 0x1000000001 > fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 > metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b > generation 9 > num_devices 2 > incompat_flags 0x741 > dev_item.devid 1 > > $ btrfs inspect dump-super /dev/loop1 | egrep '$p' > superblock: bytenr=65536, device=/dev/loop1 > flags 0x1 > fsid 11d2af4d-1b71-45a9-83f6-f2100766939d > metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b > generation 10 > num_devices 2 > incompat_flags 0x741 > dev_item.devid 2 > > $ btrfs inspect dump-super /dev/loop2 | egrep '$p' > superblock: bytenr=65536, device=/dev/loop2 > flags 0x1 > fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 > metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b > generation 8 > num_devices 2 > incompat_flags 0x741 > dev_item.devid 1 > > $ btrfs inspect dump-super /dev/loop3 | egrep '$p' > superblock: bytenr=65536, device=/dev/loop3 > flags 0x1 > fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 > metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b > generation 8 > num_devices 2 > incompat_flags 0x741 > dev_item.devid 2 > > > It is normal that some devices aren't instantly discovered during > system boot or iSCSI discovery. The controlled scan below demonstrates > this. > > $ btrfs device scan --forget > $ btrfs device scan /dev/loop0 > Scanning for btrfs filesystems on '/dev/loop0' > $ mount /dev/loop3 /btrfs > $ btrfs filesystem show -m > Label: none uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 > Total devices 2 FS bytes used 144.00KiB > devid 1 size 300.00MiB used 48.00MiB path /dev/loop0 > devid 2 size 300.00MiB used 40.00MiB path /dev/loop3 > > /dev/loop0 and /dev/loop3 are incorrectly partnered. > > This kernel patch removes functions and code connected to the > CHANGING_FSID_V2 flag. > > With this patch, now devices with the CHANGING_FSID_V2 flag are rejected. > And its partner will fail to mount with the extra -o degraded option. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > Moreover, a btrfs-progs patch (below) has eliminated the use of the > CHANGING_FSID_V2 flag entirely: > > [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit > > And we solve the compatability concerns as below: > > New-kernel new-progs - has no CHANGING_FSID_V2 flag. > Old-kernel new-progs - has no CHANGING_FSID_V2 flag, kernel code unused. > Old-kernel old-progs - bug may occur. > New-kernel old-progs - Should use host with the newer btrfs-progs to fix. > > For legacy systems to help fix such a condition in the userspace instead > we have the below patchset which ports of kernel's CHANGING_FSID_V2 code. > > [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid > > And if it couldn't fix in some cases, users can use manually reunite, > with the patchset: > > [PATCH 00/10] btrfs-progs: check and tune: add device and noscan options > > fs/btrfs/disk-io.c | 10 --- > fs/btrfs/volumes.c | 166 ++++----------------------------------------- > fs/btrfs/volumes.h | 1 - > 3 files changed, 13 insertions(+), 164 deletions(-) Please split the kernel patch in two, one rejecting the CHANGING_FSID_V2 bit and then removing the unused code. I think the scanning code still has to recognize the bit and skip the device, I haven't checked if remains like that after this patch.
On 19/09/2023 06:44, David Sterba wrote: > On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote: >> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state >> where the device in the userspace btrfstune -m|-M operation failed to >> complete changing the fsid. >> >> This flag makes the kernel to automatically determine the other >> partner devices to which a given device can be associated, based on the >> fsid, metadata_uuid and generation values. >> >> btrfstune -m|M feature is especially useful in virtual cloud setups, where >> compute instances (disk images) are quickly copied, fsid changed, and >> launched. Given numerous disk images with the same metadata_uuid but >> different fsid, there's no clear way a device can be correctly assembled >> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the >> disk could be assembled incorrectly, as in the example below: >> >> Before this patch: >> >> Consider the following two filesystems: >> /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m >> operation fails. >> >> In this scenario, as the /dev/loop0's fsid change is interrupted, and the >> CHANGING_FSID_V2 flag is set as shown below. >> >> $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags" >> >> $ btrfs inspect dump-super /dev/loop0 | egrep '$p' >> superblock: bytenr=65536, device=/dev/loop0 >> flags 0x1000000001 >> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >> generation 9 >> num_devices 2 >> incompat_flags 0x741 >> dev_item.devid 1 >> >> $ btrfs inspect dump-super /dev/loop1 | egrep '$p' >> superblock: bytenr=65536, device=/dev/loop1 >> flags 0x1 >> fsid 11d2af4d-1b71-45a9-83f6-f2100766939d >> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >> generation 10 >> num_devices 2 >> incompat_flags 0x741 >> dev_item.devid 2 >> >> $ btrfs inspect dump-super /dev/loop2 | egrep '$p' >> superblock: bytenr=65536, device=/dev/loop2 >> flags 0x1 >> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >> generation 8 >> num_devices 2 >> incompat_flags 0x741 >> dev_item.devid 1 >> >> $ btrfs inspect dump-super /dev/loop3 | egrep '$p' >> superblock: bytenr=65536, device=/dev/loop3 >> flags 0x1 >> fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >> metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b >> generation 8 >> num_devices 2 >> incompat_flags 0x741 >> dev_item.devid 2 >> >> >> It is normal that some devices aren't instantly discovered during >> system boot or iSCSI discovery. The controlled scan below demonstrates >> this. >> >> $ btrfs device scan --forget >> $ btrfs device scan /dev/loop0 >> Scanning for btrfs filesystems on '/dev/loop0' >> $ mount /dev/loop3 /btrfs >> $ btrfs filesystem show -m >> Label: none uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 >> Total devices 2 FS bytes used 144.00KiB >> devid 1 size 300.00MiB used 48.00MiB path /dev/loop0 >> devid 2 size 300.00MiB used 40.00MiB path /dev/loop3 >> >> /dev/loop0 and /dev/loop3 are incorrectly partnered. >> >> This kernel patch removes functions and code connected to the >> CHANGING_FSID_V2 flag. >> >> With this patch, now devices with the CHANGING_FSID_V2 flag are rejected. >> And its partner will fail to mount with the extra -o degraded option. >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> Moreover, a btrfs-progs patch (below) has eliminated the use of the >> CHANGING_FSID_V2 flag entirely: >> >> [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit >> >> And we solve the compatability concerns as below: >> >> New-kernel new-progs - has no CHANGING_FSID_V2 flag. >> Old-kernel new-progs - has no CHANGING_FSID_V2 flag, kernel code unused. >> Old-kernel old-progs - bug may occur. >> New-kernel old-progs - Should use host with the newer btrfs-progs to fix. >> >> For legacy systems to help fix such a condition in the userspace instead >> we have the below patchset which ports of kernel's CHANGING_FSID_V2 code. >> >> [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid >> >> And if it couldn't fix in some cases, users can use manually reunite, >> with the patchset: >> >> [PATCH 00/10] btrfs-progs: check and tune: add device and noscan options >> >> fs/btrfs/disk-io.c | 10 --- >> fs/btrfs/volumes.c | 166 ++++----------------------------------------- >> fs/btrfs/volumes.h | 1 - >> 3 files changed, 13 insertions(+), 164 deletions(-) > > Please split the kernel patch in two, one rejecting the CHANGING_FSID_V2 > bit and then removing the unused code. Yep. Will do. > I think the scanning code still > has to recognize the bit and skip the device, I haven't checked if > remains like that after this patch. Right. This patch does it. We only have to do it at device_list_add() because mount, device-scan, and device-ready all rely on device_list_add() to scan. Thanks, Anand
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9858c77b99e6..98e73805f3fc 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3156,7 +3156,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device u32 nodesize; u32 stripesize; u64 generation; - u64 features; u16 csum_type; struct btrfs_super_block *disk_super; struct btrfs_fs_info *fs_info = btrfs_sb(sb); @@ -3238,15 +3237,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device disk_super = fs_info->super_copy; - - features = btrfs_super_flags(disk_super); - if (features & BTRFS_SUPER_FLAG_CHANGING_FSID_V2) { - features &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_V2; - btrfs_set_super_flags(disk_super, features); - btrfs_info(fs_info, - "found metadata UUID change in progress flag, clearing"); - } - memcpy(fs_info->super_for_commit, fs_info->super_copy, sizeof(*fs_info->super_for_commit)); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 852590e06d76..e3a83f779558 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -453,58 +453,6 @@ static noinline struct btrfs_fs_devices *find_fsid( return NULL; } -/* - * First check if the metadata_uuid is different from the fsid in the given - * fs_devices. Then check if the given fsid is the same as the metadata_uuid - * in the fs_devices. If it is, return true; otherwise, return false. - */ -static inline bool check_fsid_changed(const struct btrfs_fs_devices *fs_devices, - const u8 *fsid) -{ - return memcmp(fs_devices->fsid, fs_devices->metadata_uuid, - BTRFS_FSID_SIZE) != 0 && - memcmp(fs_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE) == 0; -} - -static struct btrfs_fs_devices *find_fsid_with_metadata_uuid( - struct btrfs_super_block *disk_super) -{ - - struct btrfs_fs_devices *fs_devices; - - /* - * Handle scanned device having completed its fsid change but - * belonging to a fs_devices that was created by first scanning - * a device which didn't have its fsid/metadata_uuid changed - * at all and the CHANGING_FSID_V2 flag set. - */ - list_for_each_entry(fs_devices, &fs_uuids, fs_list) { - if (!fs_devices->fsid_change) - continue; - - if (match_fsid_fs_devices(fs_devices, disk_super->metadata_uuid, - fs_devices->fsid)) - return fs_devices; - } - - /* - * Handle scanned device having completed its fsid change but - * belonging to a fs_devices that was created by a device that - * has an outdated pair of fsid/metadata_uuid and - * CHANGING_FSID_V2 flag set. - */ - list_for_each_entry(fs_devices, &fs_uuids, fs_list) { - if (!fs_devices->fsid_change) - continue; - - if (check_fsid_changed(fs_devices, disk_super->metadata_uuid)) - return fs_devices; - } - - return find_fsid(disk_super->fsid, disk_super->metadata_uuid); -} - - static int btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder, int flush, struct block_device **bdev, @@ -685,84 +633,6 @@ u8 *btrfs_sb_fsid_ptr(struct btrfs_super_block *sb) return has_metadata_uuid ? sb->metadata_uuid : sb->fsid; } -/* - * Handle scanned device having its CHANGING_FSID_V2 flag set and the fs_devices - * being created with a disk that has already completed its fsid change. Such - * disk can belong to an fs which has its FSID changed or to one which doesn't. - * Handle both cases here. - */ -static struct btrfs_fs_devices *find_fsid_inprogress( - struct btrfs_super_block *disk_super) -{ - struct btrfs_fs_devices *fs_devices; - - list_for_each_entry(fs_devices, &fs_uuids, fs_list) { - if (fs_devices->fsid_change) - continue; - - if (check_fsid_changed(fs_devices, disk_super->fsid)) - return fs_devices; - } - - return find_fsid(disk_super->fsid, NULL); -} - -static struct btrfs_fs_devices *find_fsid_changed( - struct btrfs_super_block *disk_super) -{ - struct btrfs_fs_devices *fs_devices; - - /* - * Handles the case where scanned device is part of an fs that had - * multiple successful changes of FSID but currently device didn't - * observe it. Meaning our fsid will be different than theirs. We need - * to handle two subcases : - * 1 - The fs still continues to have different METADATA/FSID uuids. - * 2 - The fs is switched back to its original FSID (METADATA/FSID - * are equal). - */ - list_for_each_entry(fs_devices, &fs_uuids, fs_list) { - /* Changed UUIDs */ - if (check_fsid_changed(fs_devices, disk_super->metadata_uuid) && - memcmp(fs_devices->fsid, disk_super->fsid, - BTRFS_FSID_SIZE) != 0) - return fs_devices; - - /* Unchanged UUIDs */ - if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid, - BTRFS_FSID_SIZE) == 0 && - memcmp(fs_devices->fsid, disk_super->metadata_uuid, - BTRFS_FSID_SIZE) == 0) - return fs_devices; - } - - return NULL; -} - -static struct btrfs_fs_devices *find_fsid_reverted_metadata( - struct btrfs_super_block *disk_super) -{ - struct btrfs_fs_devices *fs_devices; - - /* - * Handle the case where the scanned device is part of an fs whose last - * metadata UUID change reverted it to the original FSID. At the same - * time fs_devices was first created by another constituent device - * which didn't fully observe the operation. This results in an - * btrfs_fs_devices created with metadata/fsid different AND - * btrfs_fs_devices::fsid_change set AND the metadata_uuid of the - * fs_devices equal to the FSID of the disk. - */ - list_for_each_entry(fs_devices, &fs_uuids, fs_list) { - if (!fs_devices->fsid_change) - continue; - - if (check_fsid_changed(fs_devices, disk_super->fsid)) - return fs_devices; - } - - return NULL; -} /* * Add new device to list of registered devices * @@ -783,8 +653,14 @@ static noinline struct btrfs_device *device_list_add(const char *path, int error; bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) & BTRFS_FEATURE_INCOMPAT_METADATA_UUID); - bool fsid_change_in_progress = (btrfs_super_flags(disk_super) & - BTRFS_SUPER_FLAG_CHANGING_FSID_V2); + + if ((btrfs_super_flags(disk_super) & + BTRFS_SUPER_FLAG_CHANGING_FSID_V2)) { + btrfs_err(NULL, +"device %s has incomplete FSID changes please use btrfstune to complete", + path); + return ERR_PTR(-EINVAL); + } error = lookup_bdev(path, &path_devt); if (error) { @@ -793,20 +669,13 @@ static noinline struct btrfs_device *device_list_add(const char *path, return ERR_PTR(error); } - if (fsid_change_in_progress) { - if (!has_metadata_uuid) - fs_devices = find_fsid_inprogress(disk_super); - else - fs_devices = find_fsid_changed(disk_super); - } else if (has_metadata_uuid) { - fs_devices = find_fsid_with_metadata_uuid(disk_super); + if (has_metadata_uuid) { + fs_devices = find_fsid(disk_super->fsid, + disk_super->metadata_uuid); } else { - fs_devices = find_fsid_reverted_metadata(disk_super); - if (!fs_devices) - fs_devices = find_fsid(disk_super->fsid, NULL); + fs_devices = find_fsid(disk_super->fsid, NULL); } - if (!fs_devices) { fs_devices = alloc_fs_devices(disk_super->fsid); if (has_metadata_uuid) @@ -816,8 +685,6 @@ static noinline struct btrfs_device *device_list_add(const char *path, if (IS_ERR(fs_devices)) return ERR_CAST(fs_devices); - fs_devices->fsid_change = fsid_change_in_progress; - mutex_lock(&fs_devices->device_list_mutex); list_add(&fs_devices->fs_list, &fs_uuids); @@ -831,18 +698,11 @@ static noinline struct btrfs_device *device_list_add(const char *path, mutex_lock(&fs_devices->device_list_mutex); device = btrfs_find_device(fs_devices, &args); - /* - * If this disk has been pulled into an fs devices created by - * a device which had the CHANGING_FSID_V2 flag then replace the - * metadata_uuid/fsid values of the fs_devices. - */ - if (fs_devices->fsid_change && - found_transid > fs_devices->latest_generation) { + if (found_transid > fs_devices->latest_generation) { memcpy(fs_devices->fsid, disk_super->fsid, BTRFS_FSID_SIZE); memcpy(fs_devices->metadata_uuid, btrfs_sb_fsid_ptr(disk_super), BTRFS_FSID_SIZE); - fs_devices->fsid_change = false; } } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 2128a032c3b7..6672ca13eda5 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -353,7 +353,6 @@ struct btrfs_fs_devices { bool rotating; /* Devices support TRIM/discard commands. */ bool discardable; - bool fsid_change; /* The filesystem is a seed filesystem. */ bool seeding;
The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state where the device in the userspace btrfstune -m|-M operation failed to complete changing the fsid. This flag makes the kernel to automatically determine the other partner devices to which a given device can be associated, based on the fsid, metadata_uuid and generation values. btrfstune -m|M feature is especially useful in virtual cloud setups, where compute instances (disk images) are quickly copied, fsid changed, and launched. Given numerous disk images with the same metadata_uuid but different fsid, there's no clear way a device can be correctly assembled with the proper partners when the CHANGING_FSID_V2 flag is set. So, the disk could be assembled incorrectly, as in the example below: Before this patch: Consider the following two filesystems: /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m operation fails. In this scenario, as the /dev/loop0's fsid change is interrupted, and the CHANGING_FSID_V2 flag is set as shown below. $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags" $ btrfs inspect dump-super /dev/loop0 | egrep '$p' superblock: bytenr=65536, device=/dev/loop0 flags 0x1000000001 fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b generation 9 num_devices 2 incompat_flags 0x741 dev_item.devid 1 $ btrfs inspect dump-super /dev/loop1 | egrep '$p' superblock: bytenr=65536, device=/dev/loop1 flags 0x1 fsid 11d2af4d-1b71-45a9-83f6-f2100766939d metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b generation 10 num_devices 2 incompat_flags 0x741 dev_item.devid 2 $ btrfs inspect dump-super /dev/loop2 | egrep '$p' superblock: bytenr=65536, device=/dev/loop2 flags 0x1 fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b generation 8 num_devices 2 incompat_flags 0x741 dev_item.devid 1 $ btrfs inspect dump-super /dev/loop3 | egrep '$p' superblock: bytenr=65536, device=/dev/loop3 flags 0x1 fsid 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 metadata_uuid bb040a9f-233a-4de2-ad84-49aa5a28059b generation 8 num_devices 2 incompat_flags 0x741 dev_item.devid 2 It is normal that some devices aren't instantly discovered during system boot or iSCSI discovery. The controlled scan below demonstrates this. $ btrfs device scan --forget $ btrfs device scan /dev/loop0 Scanning for btrfs filesystems on '/dev/loop0' $ mount /dev/loop3 /btrfs $ btrfs filesystem show -m Label: none uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45 Total devices 2 FS bytes used 144.00KiB devid 1 size 300.00MiB used 48.00MiB path /dev/loop0 devid 2 size 300.00MiB used 40.00MiB path /dev/loop3 /dev/loop0 and /dev/loop3 are incorrectly partnered. This kernel patch removes functions and code connected to the CHANGING_FSID_V2 flag. With this patch, now devices with the CHANGING_FSID_V2 flag are rejected. And its partner will fail to mount with the extra -o degraded option. Signed-off-by: Anand Jain <anand.jain@oracle.com> --- Moreover, a btrfs-progs patch (below) has eliminated the use of the CHANGING_FSID_V2 flag entirely: [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit And we solve the compatability concerns as below: New-kernel new-progs - has no CHANGING_FSID_V2 flag. Old-kernel new-progs - has no CHANGING_FSID_V2 flag, kernel code unused. Old-kernel old-progs - bug may occur. New-kernel old-progs - Should use host with the newer btrfs-progs to fix. For legacy systems to help fix such a condition in the userspace instead we have the below patchset which ports of kernel's CHANGING_FSID_V2 code. [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid And if it couldn't fix in some cases, users can use manually reunite, with the patchset: [PATCH 00/10] btrfs-progs: check and tune: add device and noscan options fs/btrfs/disk-io.c | 10 --- fs/btrfs/volumes.c | 166 ++++----------------------------------------- fs/btrfs/volumes.h | 1 - 3 files changed, 13 insertions(+), 164 deletions(-)