Message ID | 20170419210745.15263-1-kilobyte@angband.pl (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Adam Borowski posted on Wed, 19 Apr 2017 23:07:45 +0200 as excerpted: > Too many people come complaining about losing their data -- and indeed, > there's no warning outside a wiki and the mailing list tribal knowledge. > Message severity chosen for consistency with XFS -- "alert" makes dmesg > produce nice red background which should get the point across. Commenting on the idea and comment, because as a non-coder list regular, that's what I can evaluate fairly. =:^) A kernel dmesg warning like this makes more sense to me than trying to put it in, for instance, mkfs.btrfs, because the instability is primarily kernel code and at least the message can stay synced with it, being removed when considered appropriate, unlike userspace code which can't, because people often run userspace and kernelspace versions well out of sync with each other. > I intend to ask for inclusion of this one (or an equivalent) in 4.9, > either in Debian or via GregKH -- while for us kernels "that old" are > history, regular users expect stable releases to be free of known > serious data loss bugs. Arguably it should go in the LTS-4.4 series as well, because we at least try to support the last two LTS series on-list, more or less giving up beyond that, and that's the relatively new 4.9 and the now going stale but we really should be still trying to support it 4.4. Older than that, 4.1 was the only LTS after initial code completion, but since it should be simple enough even before that and certainly would be true, queuing the patch for any still being updated LTS back to initial partial support (3.9 IIRC) is arguably worthwhile.
On Thu, Apr 20, 2017 at 3:13 PM, Duncan <1i5t5.duncan@cox.net> wrote: > Adam Borowski posted on Wed, 19 Apr 2017 23:07:45 +0200 as excerpted: > >> Too many people come complaining about losing their data -- and indeed, >> there's no warning outside a wiki and the mailing list tribal knowledge. >> Message severity chosen for consistency with XFS -- "alert" makes dmesg >> produce nice red background which should get the point across. > > Commenting on the idea and comment, because as a non-coder list regular, > that's what I can evaluate fairly. =:^) > > A kernel dmesg warning like this makes more sense to me than trying to > put it in, for instance, mkfs.btrfs, because the instability is primarily > kernel code and at least the message can stay synced with it, being > removed when considered appropriate, unlike userspace code which can't, > because people often run userspace and kernelspace versions well out of > sync with each other. > Seconded. As someone who's been trying to get BtrFS adopted, the biggest hurdle has been around perception. Rarely do people use the userspace tools directly, but rather through multiple layers of abstraction where they don't see any warnings coming from it. I think adding these warnings to kernel logs is an excellent suggestion. >> I intend to ask for inclusion of this one (or an equivalent) in 4.9, >> either in Debian or via GregKH -- while for us kernels "that old" are >> history, regular users expect stable releases to be free of known >> serious data loss bugs. > > Arguably it should go in the LTS-4.4 series as well, because we at least > try to support the last two LTS series on-list, more or less giving up > beyond that, and that's the relatively new 4.9 and the now going stale > but we really should be still trying to support it 4.4. Older than that, > 4.1 was the only LTS after initial code completion, but since it should > be simple enough even before that and certainly would be true, queuing > the patch for any still being updated LTS back to initial partial support > (3.9 IIRC) is arguably worthwhile. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 19, 2017 at 11:07:45PM +0200, Adam Borowski wrote: > Too many people come complaining about losing their data -- and indeed, > there's no warning outside a wiki and the mailing list tribal knowledge. > Message severity chosen for consistency with XFS -- "alert" makes dmesg > produce nice red background which should get the point across. ... > I intend to ask for inclusion of this one (or an equivalent) in 4.9, either > in Debian or via GregKH -- while for us kernels "that old" are history, > regular users expect stable releases to be free of known serious data loss > bugs. Hi guys, could you please comment? While there's only relatively little urgency for mainline (heck, it'd be best if the warning was not needed at all!), there's a Debian release close by, and it's be grossly inresponsible to not let people know that a feature advertised in the documentation is in an unusable state (especially as of 4.9). For you, filesystem developers, a way of thinking that "the user should do research" might be acceptable, but once it filters down to a stable release, the user expects no known serious bugs. And here the severity is "critical -- causes serious data loss". > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index e54844767fe5..e7f91f70e149 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3083,6 +3083,14 @@ int open_ctree(struct super_block *sb, > btrfs_set_opt(fs_info->mount_opt, SSD); > } > > + if ((fs_info->avail_data_alloc_bits | > + fs_info->avail_metadata_alloc_bits | > + fs_info->avail_system_alloc_bits) & > + BTRFS_BLOCK_GROUP_RAID56_MASK) { > + btrfs_alert(fs_info, > + "btrfs RAID5/6 is EXPERIMENTAL and has known data-loss bugs"); > + } > + > /* > * Mount does not set all options immediately, we can do it now and do > * not have to wait for transaction commit > -- Doing this in the kernel should be better than in userspace (like https://patchwork.kernel.org/patch/9450035/) as it can deal with a future kernel with working RAID5/6 on old -progs; but if you prefer, I can finish that patch and request its inclusion in Debian stretch -progs instead or in addition to the above warning in the kernel. ᛗᛖᛟᚹ!
On Wed, Apr 26, 2017 at 04:14:16AM +0200, Adam Borowski wrote: > On Wed, Apr 19, 2017 at 11:07:45PM +0200, Adam Borowski wrote: > > Too many people come complaining about losing their data -- and indeed, > > there's no warning outside a wiki and the mailing list tribal knowledge. > > Message severity chosen for consistency with XFS -- "alert" makes dmesg > > produce nice red background which should get the point across. > ... > > I intend to ask for inclusion of this one (or an equivalent) in 4.9, either > > in Debian or via GregKH -- while for us kernels "that old" are history, > > regular users expect stable releases to be free of known serious data loss > > bugs. > > Hi guys, could you please comment? While there's only relatively little > urgency for mainline (heck, it'd be best if the warning was not needed at > all!), there's a Debian release close by, and it's be grossly inresponsible > to not let people know that a feature advertised in the documentation is in > an unusable state (especially as of 4.9). For you, filesystem developers, > a way of thinking that "the user should do research" might be acceptable, > but once it filters down to a stable release, the user expects no known > serious bugs. There are some raid56 fixes in 4.12, but IIRC the write hole is still unfixed so the warning is still valid even for 4.12. It would be easier to get the patch to 4.4 or 4.9 once it's in Linus tree. > > And here the severity is "critical -- causes serious data loss". > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > index e54844767fe5..e7f91f70e149 100644 > > --- a/fs/btrfs/disk-io.c > > +++ b/fs/btrfs/disk-io.c > > @@ -3083,6 +3083,14 @@ int open_ctree(struct super_block *sb, > > btrfs_set_opt(fs_info->mount_opt, SSD); > > } > > > > + if ((fs_info->avail_data_alloc_bits | > > + fs_info->avail_metadata_alloc_bits | > > + fs_info->avail_system_alloc_bits) & > > + BTRFS_BLOCK_GROUP_RAID56_MASK) { > > + btrfs_alert(fs_info, > > + "btrfs RAID5/6 is EXPERIMENTAL and has known data-loss bugs"); > > + } > > + > > /* > > * Mount does not set all options immediately, we can do it now and do > > * not have to wait for transaction commit > > -- > > Doing this in the kernel should be better than in userspace (like > https://patchwork.kernel.org/patch/9450035/) as it can deal with a future > kernel with working RAID5/6 on old -progs; but if you prefer, I can finish > that patch and request its inclusion in Debian stretch -progs instead or in > addition to the above warning in the kernel. I'll have look again. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 05, 2017 at 09:45:30PM +0200, David Sterba wrote: > On Wed, Apr 26, 2017 at 04:14:16AM +0200, Adam Borowski wrote: > > On Wed, Apr 19, 2017 at 11:07:45PM +0200, Adam Borowski wrote: > > > Too many people come complaining about losing their data -- and indeed, > > > there's no warning outside a wiki and the mailing list tribal knowledge. > > > Message severity chosen for consistency with XFS -- "alert" makes dmesg > > > produce nice red background which should get the point across. > > ... > > > I intend to ask for inclusion of this one (or an equivalent) in 4.9, either > > > in Debian or via GregKH -- while for us kernels "that old" are history, > > > regular users expect stable releases to be free of known serious data loss > > > bugs. > > > There are some raid56 fixes in 4.12, but IIRC the write hole is still > unfixed so the warning is still valid even for 4.12. It would be easier > to get the patch to 4.4 or 4.9 once it's in Linus tree. I've taken pre-4.12 for a spin (mason/for-linus-4.12 atop of midway merge window v4.11-10603-g13e098814037), and it indeed succeeds on a number of tests I've thrown at it that 4.11 fails. My tests were not exhaustive (corruption at rest only, with no unclean shutdowns) but it looks good so far. So implementation bugs are getting ironed out; 4.9..4.11 had no improvements but 4.12 is a nice step forward. Still dies horribly on 32-bit. Write hole is pretty nasty for metadata (likely to cause total filesystem loss) but when on -draid{5,6} -mraid{1,10} it's nowhere as bad. So for 4.12 it might be ok to put up big warnings only for metadata. On the other hand, data loss limited to 1-2 files is still data loss -- CoW is supposed to never damage files already written. A real fix is obviously better than slapping on warnings. > > And here the severity is "critical -- causes serious data loss". As for 4.9, though, the Debian release is coming very soon, and kernel updates there require far more effort than the usual every 4-8 days GregKH point release. So I'd need to harass Ben Hutchings really soon (and possibly it's already too late). > > > + if ((fs_info->avail_data_alloc_bits | > > > + fs_info->avail_metadata_alloc_bits | > > > + fs_info->avail_system_alloc_bits) & > > > + BTRFS_BLOCK_GROUP_RAID56_MASK) { > > > + btrfs_alert(fs_info, > > > + "btrfs RAID5/6 is EXPERIMENTAL and has known data-loss bugs"); > > Doing this in the kernel should be better than in userspace (like > > https://patchwork.kernel.org/patch/9450035/) as it can deal with a future > > kernel with working RAID5/6 on old -progs; but if you prefer, I can finish > > that patch and request its inclusion in Debian stretch -progs instead or in > > addition to the above warning in the kernel. > > I'll have look again. I haven't addresses the concerns for the -progs patch -- at this moment having you look there would be a waste of time. So the question is: do you want the warning to be in kernel (where it won't get out of sync), progs (where it might be easier to notice) or both? Meow!
Hi, On 2017-05-09 03:49, Adam Borowski wrote: > Write hole is pretty nasty for metadata (likely to cause total filesystem > loss) but when on -draid{5,6} -mraid{1,10} it's nowhere as bad. So for 4.12 > it might be ok to put up big warnings only for metadata. On the other hand, > data loss limited to 1-2 files is still data loss -- CoW is supposed to > never damage files already written. > > A real fix is obviously better than slapping on warnings. The write hole is a real concern ? Only in the last year the linux MD raid implementation has gained a journal to avoid this problem. This means the in the last 15-years (or even more) this problem was here but its severity was "acceptable". What I am trying to say, is that until the kernel 4.12, btrfs had several bugs which prevent to work even the basic raid5/6 functionalities (i.e. rebuild). This is a thing that the user should be warned. Because these kinds of bugs are unexpected by a "stable filesystem". But the raid5/6 write hole is "defect" of all raid5/6 implementation. Until ZFS and the last iteration of MD, the real/only mitigation was a battery backup. In this BTRFS is not worse (nor better) than its competitor (xfs/ext3,4....). I am inclined to think that a warning for the write hole is a bit excessive. BR G.Baroncelli
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e54844767fe5..e7f91f70e149 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3083,6 +3083,14 @@ int open_ctree(struct super_block *sb, btrfs_set_opt(fs_info->mount_opt, SSD); } + if ((fs_info->avail_data_alloc_bits | + fs_info->avail_metadata_alloc_bits | + fs_info->avail_system_alloc_bits) & + BTRFS_BLOCK_GROUP_RAID56_MASK) { + btrfs_alert(fs_info, + "btrfs RAID5/6 is EXPERIMENTAL and has known data-loss bugs"); + } + /* * Mount does not set all options immediately, we can do it now and do * not have to wait for transaction commit
Too many people come complaining about losing their data -- and indeed, there's no warning outside a wiki and the mailing list tribal knowledge. Message severity chosen for consistency with XFS -- "alert" makes dmesg produce nice red background which should get the point across. Signed-off-by: Adam Borowski <kilobyte@angband.pl> --- Resent alone, the other patch in the original series (dropping the incompat flag when no longer needed) is pointless on its own. I intend to ask for inclusion of this one (or an equivalent) in 4.9, either in Debian or via GregKH -- while for us kernels "that old" are history, regular users expect stable releases to be free of known serious data loss bugs. fs/btrfs/disk-io.c | 8 ++++++++ 1 file changed, 8 insertions(+)