mbox series

[00/16] btrfs-progs: recover from failed metadata_uuid

Message ID cover.1692018849.git.anand.jain@oracle.com (mailing list archive)
Headers show
Series btrfs-progs: recover from failed metadata_uuid | expand

Message

Anand Jain Aug. 14, 2023, 3:27 p.m. UTC
The kernel reunites split-brained devices after a failed `btrfstune -m|M`
operation. We can achieve the same in btrfs-progs. So port it here.
Ref the discussion here:

   https://lore.kernel.org/all/1fa6802b-5812-14a8-3fc8-5da54bb5f79d@oracle.com/

Patch 1/16 wasn't integrated as part of the set
	[PATCH 00/10 v2] fixes and preparatory related to metadata_uuid
it's now merged with this patchset.

Patches [2-6,11,12] are cleanup patches.

Patches [7,8,10] are preparatory.

Patch [9] addresses a bug.

Patches [13, 14, 15] provide recovery from previously failed
`btrfstune -m|M` operations.

Patch [16] enhances the misc-test `034-metadata-uuid` to also validate this
new recovery feature.

This set has been successfully tested with the btrfs-progs testsuite.

This patchset is on top the latest devel last commit:
 8aba9b0052b6 btrfs-progs: btrfstune: consolidate error handling in main()


Anand Jain (16):
  btrfs-progs: track num_devices per fs_devices
  btrfs-progs: tune can use local fs_info variable
  btrfs-progs: rename set_metadata_uuid arg to new_fsid_str
  btrfs-progs: rename set_metadata_uuid new_fsid to fsid
  btrfs-progs: rename set_metadata_uuid new_uuid to new_fsid
  btrfs-progs: rename set_metadata_uuid uuid_changed to fsid_changed
  btrfs-progs: pass fsid in check_unfinished_fsid_change arg2
  btrfs-progs: pass metadata_uuid in check_unfinished_fsid_change arg3
  btrfs-progs: fix return without flag reset commit in tune
  btrfs-progs: preparing the latest device's superblock for commit
  btrfs-progs: rename fs_devices::list to match the kernel
  btrfs-progs: rename fs_devices::latest_trans to match the kernel
  btrfs-progs: tune use the latest bdev in fs_devices for super_copy
  btrfs-progs: add support to fix superblock with CHANGING_FSID_V2 flag
  btrfs-progs: recover from the failed btrfstune -m|M
  btrfs-progs: test btrfstune -m|M ability to fix previous failures

 cmds/filesystem.c                          |  14 +-
 common/device-scan.c                       |   2 +-
 kernel-shared/disk-io.c                    |   3 +
 kernel-shared/disk-io.h                    |   5 +
 kernel-shared/volumes.c                    | 203 +++++++++++++++++++--
 kernel-shared/volumes.h                    |   5 +-
 tests/misc-tests/034-metadata-uuid/test.sh |  70 +++++--
 tune/change-metadata-uuid.c                |  56 +++---
 tune/main.c                                |  35 ++--
 9 files changed, 312 insertions(+), 81 deletions(-)

Comments

David Sterba Aug. 23, 2023, 10:13 p.m. UTC | #1
On Mon, Aug 14, 2023 at 11:27:56PM +0800, Anand Jain wrote:
> The kernel reunites split-brained devices after a failed `btrfstune -m|M`
> operation. We can achieve the same in btrfs-progs. So port it here.
> Ref the discussion here:
> 
>    https://lore.kernel.org/all/1fa6802b-5812-14a8-3fc8-5da54bb5f79d@oracle.com/
> 
> Patch 1/16 wasn't integrated as part of the set
> 	[PATCH 00/10 v2] fixes and preparatory related to metadata_uuid
> it's now merged with this patchset.
> 
> Patches [2-6,11,12] are cleanup patches.
> 
> Patches [7,8,10] are preparatory.
> 
> Patch [9] addresses a bug.
> 
> Patches [13, 14, 15] provide recovery from previously failed
> `btrfstune -m|M` operations.
> 
> Patch [16] enhances the misc-test `034-metadata-uuid` to also validate this
> new recovery feature.
> 
> This set has been successfully tested with the btrfs-progs testsuite.
> 
> This patchset is on top the latest devel last commit:
>  8aba9b0052b6 btrfs-progs: btrfstune: consolidate error handling in main()
> 
> 
> Anand Jain (16):
>   btrfs-progs: track num_devices per fs_devices
>   btrfs-progs: tune can use local fs_info variable
>   btrfs-progs: rename set_metadata_uuid arg to new_fsid_str
>   btrfs-progs: rename set_metadata_uuid new_fsid to fsid
>   btrfs-progs: rename set_metadata_uuid new_uuid to new_fsid
>   btrfs-progs: rename set_metadata_uuid uuid_changed to fsid_changed
>   btrfs-progs: pass fsid in check_unfinished_fsid_change arg2
>   btrfs-progs: pass metadata_uuid in check_unfinished_fsid_change arg3
>   btrfs-progs: fix return without flag reset commit in tune
>   btrfs-progs: preparing the latest device's superblock for commit
>   btrfs-progs: rename fs_devices::list to match the kernel
>   btrfs-progs: rename fs_devices::latest_trans to match the kernel
>   btrfs-progs: tune use the latest bdev in fs_devices for super_copy
>   btrfs-progs: add support to fix superblock with CHANGING_FSID_V2 flag
>   btrfs-progs: recover from the failed btrfstune -m|M
>   btrfs-progs: test btrfstune -m|M ability to fix previous failures

Patches added to devel, thanks.
David Sterba Aug. 23, 2023, 10:24 p.m. UTC | #2
On Thu, Aug 24, 2023 at 12:13:15AM +0200, David Sterba wrote:
> On Mon, Aug 14, 2023 at 11:27:56PM +0800, Anand Jain wrote:
> > The kernel reunites split-brained devices after a failed `btrfstune -m|M`
> > operation. We can achieve the same in btrfs-progs. So port it here.
> > Ref the discussion here:
> > 
> >    https://lore.kernel.org/all/1fa6802b-5812-14a8-3fc8-5da54bb5f79d@oracle.com/
> > 
> > Patch 1/16 wasn't integrated as part of the set
> > 	[PATCH 00/10 v2] fixes and preparatory related to metadata_uuid
> > it's now merged with this patchset.
> > 
> > Patches [2-6,11,12] are cleanup patches.
> > 
> > Patches [7,8,10] are preparatory.
> > 
> > Patch [9] addresses a bug.
> > 
> > Patches [13, 14, 15] provide recovery from previously failed
> > `btrfstune -m|M` operations.
> > 
> > Patch [16] enhances the misc-test `034-metadata-uuid` to also validate this
> > new recovery feature.
> > 
> > This set has been successfully tested with the btrfs-progs testsuite.
> > 
> > This patchset is on top the latest devel last commit:
> >  8aba9b0052b6 btrfs-progs: btrfstune: consolidate error handling in main()
> > 
> > 
> > Anand Jain (16):
> >   btrfs-progs: track num_devices per fs_devices
> >   btrfs-progs: tune can use local fs_info variable
> >   btrfs-progs: rename set_metadata_uuid arg to new_fsid_str
> >   btrfs-progs: rename set_metadata_uuid new_fsid to fsid
> >   btrfs-progs: rename set_metadata_uuid new_uuid to new_fsid
> >   btrfs-progs: rename set_metadata_uuid uuid_changed to fsid_changed
> >   btrfs-progs: pass fsid in check_unfinished_fsid_change arg2
> >   btrfs-progs: pass metadata_uuid in check_unfinished_fsid_change arg3
> >   btrfs-progs: fix return without flag reset commit in tune
> >   btrfs-progs: preparing the latest device's superblock for commit
> >   btrfs-progs: rename fs_devices::list to match the kernel
> >   btrfs-progs: rename fs_devices::latest_trans to match the kernel
> >   btrfs-progs: tune use the latest bdev in fs_devices for super_copy
> >   btrfs-progs: add support to fix superblock with CHANGING_FSID_V2 flag
> >   btrfs-progs: recover from the failed btrfstune -m|M
> >   btrfs-progs: test btrfstune -m|M ability to fix previous failures
> 
> Patches added to devel, thanks.

On my machine the metadata uuid test does not run because the module is
not loadable, but the GH actions report a failure:
https://github.com/kdave/btrfs-progs/actions/runs/5956097489/job/16156138260
Anand Jain Aug. 24, 2023, 1:54 p.m. UTC | #3
On 24/08/2023 06:24, David Sterba wrote:
> On Thu, Aug 24, 2023 at 12:13:15AM +0200, David Sterba wrote:
>> On Mon, Aug 14, 2023 at 11:27:56PM +0800, Anand Jain wrote:
>>> The kernel reunites split-brained devices after a failed `btrfstune -m|M`
>>> operation. We can achieve the same in btrfs-progs. So port it here.
>>> Ref the discussion here:
>>>
>>>     https://lore.kernel.org/all/1fa6802b-5812-14a8-3fc8-5da54bb5f79d@oracle.com/
>>>
>>> Patch 1/16 wasn't integrated as part of the set
>>> 	[PATCH 00/10 v2] fixes and preparatory related to metadata_uuid
>>> it's now merged with this patchset.
>>>
>>> Patches [2-6,11,12] are cleanup patches.
>>>
>>> Patches [7,8,10] are preparatory.
>>>
>>> Patch [9] addresses a bug.
>>>
>>> Patches [13, 14, 15] provide recovery from previously failed
>>> `btrfstune -m|M` operations.
>>>
>>> Patch [16] enhances the misc-test `034-metadata-uuid` to also validate this
>>> new recovery feature.
>>>
>>> This set has been successfully tested with the btrfs-progs testsuite.
>>>
>>> This patchset is on top the latest devel last commit:
>>>   8aba9b0052b6 btrfs-progs: btrfstune: consolidate error handling in main()
>>>
>>>
>>> Anand Jain (16):
>>>    btrfs-progs: track num_devices per fs_devices
>>>    btrfs-progs: tune can use local fs_info variable
>>>    btrfs-progs: rename set_metadata_uuid arg to new_fsid_str
>>>    btrfs-progs: rename set_metadata_uuid new_fsid to fsid
>>>    btrfs-progs: rename set_metadata_uuid new_uuid to new_fsid
>>>    btrfs-progs: rename set_metadata_uuid uuid_changed to fsid_changed
>>>    btrfs-progs: pass fsid in check_unfinished_fsid_change arg2
>>>    btrfs-progs: pass metadata_uuid in check_unfinished_fsid_change arg3
>>>    btrfs-progs: fix return without flag reset commit in tune
>>>    btrfs-progs: preparing the latest device's superblock for commit
>>>    btrfs-progs: rename fs_devices::list to match the kernel
>>>    btrfs-progs: rename fs_devices::latest_trans to match the kernel
>>>    btrfs-progs: tune use the latest bdev in fs_devices for super_copy
>>>    btrfs-progs: add support to fix superblock with CHANGING_FSID_V2 flag
>>>    btrfs-progs: recover from the failed btrfstune -m|M
>>>    btrfs-progs: test btrfstune -m|M ability to fix previous failures
>>
>> Patches added to devel, thanks.
> 
> On my machine the metadata uuid test does not run because the module is
> not loadable, but the GH actions report a failure:
> https://github.com/kdave/btrfs-progs/actions/runs/5956097489/job/1615613826

Local VM successfully runs misc-tests/034*. However, on OCI, the same
error as GH. Error reports missing device. It appears, inconsistent
results due to the varying device scan order from system to system.
I am looking more.
David Sterba Aug. 25, 2023, 11:53 a.m. UTC | #4
On Thu, Aug 24, 2023 at 09:54:30PM +0800, Anand Jain wrote:
> >> Patches added to devel, thanks.
> > 
> > On my machine the metadata uuid test does not run because the module is
> > not loadable, but the GH actions report a failure:
> > https://github.com/kdave/btrfs-progs/actions/runs/5956097489/job/1615613826
> 
> Local VM successfully runs misc-tests/034*. However, on OCI, the same
> error as GH. Error reports missing device. It appears, inconsistent
> results due to the varying device scan order from system to system.
> I am looking more.

Patches 13-16 have been removed from devel until the issue is resolved.
I've enabled build tests for pull requests you can use the github CI for
testing too (open a PR against devel or master branch).
Anand Jain Aug. 25, 2023, 2:57 p.m. UTC | #5
On 8/25/23 19:53, David Sterba wrote:
> On Thu, Aug 24, 2023 at 09:54:30PM +0800, Anand Jain wrote:
>>>> Patches added to devel, thanks.
>>>
>>> On my machine the metadata uuid test does not run because the module is
>>> not loadable, but the GH actions report a failure:
>>> https://github.com/kdave/btrfs-progs/actions/runs/5956097489/job/1615613826
>>
>> Local VM successfully runs misc-tests/034*. However, on OCI, the same
>> error as GH. Error reports missing device. It appears, inconsistent
>> results due to the varying device scan order from system to system.
>> I am looking more.
> 
> Patches 13-16 have been removed from devel until the issue is resolved.
> I've enabled build tests for pull requests you can use the github CI for
> testing too (open a PR against devel or master branch).

Yes, I noticed that patches 1 to 12 have been merged. I am able to
reproduce the bug. V3 is tested working fine.