Message ID | 1632330390-29793-1-git-send-email-zhanglikernel@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | clear_bit BTRFS_DEV_STATE_MISSING if the device->bdev is not NULL During mount. | expand |
On 22.09.21 г. 20:06, Li Zhang wrote: > reference to: > https://github.com/kdave/btrfs-progs/issues/389 > If a device offline somehow, but after a moment, it is back > online, btrfs filesystem only mark the device is missing, but > don't mark it right. > > However, Although filesystem mark device presented, in my > case the /dev/loop2 is come back, the generation of this > /dev/loop2 super block is not right.Because the device's > data is not up-to-date. So the info of status of scrubs > would list as following.: > > $ losetup /dev/loop1 test1 > $ losetup /dev/loop2 test2 > $ losetup -d /dev/loop2 > $ mount -o degraded /dev/loop1 /mnt/1 > $ touch /mnt/1/file1.txt > $ umount /mnt/1 > $ losetup /dev/loop2 test2 > $ mount /dev/loop1 /mnt/1 > $ btrfs scrub start /mnt/1 > scrub started on /mnt/1, fsid 88a3295f-c052-4208-9b1f-7b2c5074c20a (pid=2477) > $ WARNING: errors detected during scrubbing, corrected > THis seems it can be turned into an fstests rather easily, care to send a patch for it as well ? > $btrfs scrub status /mnt/1 > UUID: 88a3295f-c052-4208-9b1f-7b2c5074c20a > Scrub started: Thu Sep 23 00:34:55 2021 > Status: finished > Duration: 0:00:01 > Total to scrub: 272.00KiB > Rate: 272.00KiB/s > Error summary: super=2 verify=5 > Corrected: 5 > Uncorrectable: 0 > Unverified: 0 > > And if we umount and mount again everything would be all right. > > In my opinion, we could improve this scrub in the following > way, i personally vote the second method > > 1) Sync all data immediately during mounting, but it waste IO > resource. > > 2) In scrub, we should give detailed information of every device > instead of a single filesystem, since scrub is launching a number > of thread to scanning each device instead scan whole filesystem. > Futher more we should give user hint about how to fix it, in my > case, user should umount filesystem and mount it again. But if > data is corrupt, the repair program might be called, in case of > further damage, user should replace a device and reconstruct > ata using raid1, raid6 and so on. > > Signed-off-by: Li Zhang <zhanglikernel@gmail.com> > --- > fs/btrfs/volumes.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 464485a..99fdbaa 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -7198,6 +7198,11 @@ static int read_one_dev(struct extent_buffer *leaf, > set_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state); > } > > + if (device->bdev && test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state) && device->fs_devices->missing_devices > 0) { > + device->fs_devices->missing_devices--; > + clear_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state); > + } > + > /* Move the device to its own fs_devices */ > if (device->fs_devices != fs_devices) { > ASSERT(test_bit(BTRFS_DEV_STATE_MISSING, >
Sure, I would love to do it. To avoid ambiguity, Should I and write a test case to detect whether clear the BTRFS_DEV_STATE_MISSING if filesystem found a device, but it was marked to BTRFS_DEV_STATE_MISSING. Nikolay Borisov <nborisov@suse.com> 于2021年9月23日周四 下午3:41写道: > > > > On 22.09.21 г. 20:06, Li Zhang wrote: > > reference to: > > https://github.com/kdave/btrfs-progs/issues/389 > > If a device offline somehow, but after a moment, it is back > > online, btrfs filesystem only mark the device is missing, but > > don't mark it right. > > > > However, Although filesystem mark device presented, in my > > case the /dev/loop2 is come back, the generation of this > > /dev/loop2 super block is not right.Because the device's > > data is not up-to-date. So the info of status of scrubs > > would list as following.: > > > > $ losetup /dev/loop1 test1 > > $ losetup /dev/loop2 test2 > > $ losetup -d /dev/loop2 > > $ mount -o degraded /dev/loop1 /mnt/1 > > $ touch /mnt/1/file1.txt > > $ umount /mnt/1 > > $ losetup /dev/loop2 test2 > > $ mount /dev/loop1 /mnt/1 > > $ btrfs scrub start /mnt/1 > > scrub started on /mnt/1, fsid 88a3295f-c052-4208-9b1f-7b2c5074c20a (pid=2477) > > $ WARNING: errors detected during scrubbing, corrected > > > > THis seems it can be turned into an fstests rather easily, care to send > a patch for it as well ? > > > $btrfs scrub status /mnt/1 > > UUID: 88a3295f-c052-4208-9b1f-7b2c5074c20a > > Scrub started: Thu Sep 23 00:34:55 2021 > > Status: finished > > Duration: 0:00:01 > > Total to scrub: 272.00KiB > > Rate: 272.00KiB/s > > Error summary: super=2 verify=5 > > Corrected: 5 > > Uncorrectable: 0 > > Unverified: 0 > > > > And if we umount and mount again everything would be all right. > > > > In my opinion, we could improve this scrub in the following > > way, i personally vote the second method > > > > 1) Sync all data immediately during mounting, but it waste IO > > resource. > > > > 2) In scrub, we should give detailed information of every device > > instead of a single filesystem, since scrub is launching a number > > of thread to scanning each device instead scan whole filesystem. > > Futher more we should give user hint about how to fix it, in my > > case, user should umount filesystem and mount it again. But if > > data is corrupt, the repair program might be called, in case of > > further damage, user should replace a device and reconstruct > > ata using raid1, raid6 and so on. > > > > Signed-off-by: Li Zhang <zhanglikernel@gmail.com> > > --- > > fs/btrfs/volumes.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > > index 464485a..99fdbaa 100644 > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -7198,6 +7198,11 @@ static int read_one_dev(struct extent_buffer *leaf, > > set_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state); > > } > > > > + if (device->bdev && test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state) && device->fs_devices->missing_devices > 0) { > > + device->fs_devices->missing_devices--; > > + clear_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state); > > + } > > + > > /* Move the device to its own fs_devices */ > > if (device->fs_devices != fs_devices) { > > ASSERT(test_bit(BTRFS_DEV_STATE_MISSING, > >
On 23.09.21 г. 13:10, li zhang wrote: > Sure, I would love to do it. > > To avoid ambiguity, Should I and write a test > case to detect whether clear the BTRFS_DEV_STATE_MISSING > if filesystem found a device, but it was marked to > BTRFS_DEV_STATE_MISSING. Yes, basically the idea is for the test case to fail without your patch and pass with your patch. That way we'll ensure this won't regress in the future.
Got it. Nikolay Borisov <nborisov@suse.com> 于2021年9月23日周四 下午6:53写道: > > > > On 23.09.21 г. 13:10, li zhang wrote: > > Sure, I would love to do it. > > > > To avoid ambiguity, Should I and write a test > > case to detect whether clear the BTRFS_DEV_STATE_MISSING > > if filesystem found a device, but it was marked to > > BTRFS_DEV_STATE_MISSING. > > Yes, basically the idea is for the test case to fail without your patch > and pass with your patch. That way we'll ensure this won't regress in > the future.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 464485a..99fdbaa 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7198,6 +7198,11 @@ static int read_one_dev(struct extent_buffer *leaf, set_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state); } + if (device->bdev && test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state) && device->fs_devices->missing_devices > 0) { + device->fs_devices->missing_devices--; + clear_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state); + } + /* Move the device to its own fs_devices */ if (device->fs_devices != fs_devices) { ASSERT(test_bit(BTRFS_DEV_STATE_MISSING,
reference to: https://github.com/kdave/btrfs-progs/issues/389 If a device offline somehow, but after a moment, it is back online, btrfs filesystem only mark the device is missing, but don't mark it right. However, Although filesystem mark device presented, in my case the /dev/loop2 is come back, the generation of this /dev/loop2 super block is not right.Because the device's data is not up-to-date. So the info of status of scrubs would list as following.: $ losetup /dev/loop1 test1 $ losetup /dev/loop2 test2 $ losetup -d /dev/loop2 $ mount -o degraded /dev/loop1 /mnt/1 $ touch /mnt/1/file1.txt $ umount /mnt/1 $ losetup /dev/loop2 test2 $ mount /dev/loop1 /mnt/1 $ btrfs scrub start /mnt/1 scrub started on /mnt/1, fsid 88a3295f-c052-4208-9b1f-7b2c5074c20a (pid=2477) $ WARNING: errors detected during scrubbing, corrected $btrfs scrub status /mnt/1 UUID: 88a3295f-c052-4208-9b1f-7b2c5074c20a Scrub started: Thu Sep 23 00:34:55 2021 Status: finished Duration: 0:00:01 Total to scrub: 272.00KiB Rate: 272.00KiB/s Error summary: super=2 verify=5 Corrected: 5 Uncorrectable: 0 Unverified: 0 And if we umount and mount again everything would be all right. In my opinion, we could improve this scrub in the following way, i personally vote the second method 1) Sync all data immediately during mounting, but it waste IO resource. 2) In scrub, we should give detailed information of every device instead of a single filesystem, since scrub is launching a number of thread to scanning each device instead scan whole filesystem. Futher more we should give user hint about how to fix it, in my case, user should umount filesystem and mount it again. But if data is corrupt, the repair program might be called, in case of further damage, user should replace a device and reconstruct ata using raid1, raid6 and so on. Signed-off-by: Li Zhang <zhanglikernel@gmail.com> --- fs/btrfs/volumes.c | 5 +++++ 1 file changed, 5 insertions(+)