Message ID | 1401357597-9494-2-git-send-email-wangsl.fnst@cn.fujitsu.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | David Sterba |
Headers | show |
On Thu, May 29, 2014 at 05:59:57PM +0800, Wang Shilong wrote: > If checksum root is corrupted, fsck will get segmentation. This > is because if we fail to load checksum root, root's node is NULL which > cause NULL pointer deferences later. > > To fix this problem, we just did something like extent tree rebuilding. > Allocate a new one and clear uptodate flag. We will do sanity check > before fsck going on. I'm a bit worried about recommending --init-csum-root, though in this case there's not much else left to do. A filesystem with initialized csum tree will mount, but reading non-inline data will produce 'csum missing' errors. > --- a/cmds-check.c > +++ b/cmds-check.c > @@ -6963,6 +6963,11 @@ int cmd_check(int argc, char **argv) > ret = -EIO; > goto close_out; > } > + if (!extent_buffer_uptodate(info->csum_root->node)) { > + fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n"); > + ret = -EIO; > + goto close_out; So this should prevent segfaults due to missing csum tree, fine. The error message can copy what the broken extent tree reports a few lines above. And now that I'm looking at other extent_buffer_uptodate(tree) checks in the function, for clarity, each root check should be done separately and followed by a message that says which tree is broken. The idea behind this is to do improve the error reporting and then document what type of breakage can be fixed and how. I'm CCing Chris, as this is a matter of design and direction of fsck, more oppinions are desirable. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/03/2014 01:27 AM, David Sterba wrote: > On Thu, May 29, 2014 at 05:59:57PM +0800, Wang Shilong wrote: >> If checksum root is corrupted, fsck will get segmentation. This >> is because if we fail to load checksum root, root's node is NULL which >> cause NULL pointer deferences later. >> >> To fix this problem, we just did something like extent tree rebuilding. >> Allocate a new one and clear uptodate flag. We will do sanity check >> before fsck going on. > I'm a bit worried about recommending --init-csum-root, though in this > case there's not much else left to do. A filesystem with initialized > csum tree will mount, but reading non-inline data will produce 'csum > missing' errors. Agree. >> --- a/cmds-check.c >> +++ b/cmds-check.c >> @@ -6963,6 +6963,11 @@ int cmd_check(int argc, char **argv) >> ret = -EIO; >> goto close_out; >> } >> + if (!extent_buffer_uptodate(info->csum_root->node)) { >> + fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n"); >> + ret = -EIO; >> + goto close_out; > So this should prevent segfaults due to missing csum tree, fine. The > error message can copy what the broken extent tree reports a few lines > above. > > And now that I'm looking at other extent_buffer_uptodate(tree) checks in > the function, for clarity, each root check should be done separately and > followed by a message that says which tree is broken. Normally, extent_buffer_update(tree) is called after reading. We need this in fsck is because we need reinit extent tree and csum tree. check it again is to make sure root node has been setup properly and fsck can go further.. > > The idea behind this is to do improve the error reporting and then > document what type of breakage can be fixed and how. > > I'm CCing Chris, as this is a matter of design and direction of fsck, > more oppinions are desirable. > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jun 03, 2014 at 11:25:49AM +0800, Wang Shilong wrote: > On 06/03/2014 01:27 AM, David Sterba wrote: > >On Thu, May 29, 2014 at 05:59:57PM +0800, Wang Shilong wrote: > >>If checksum root is corrupted, fsck will get segmentation. This > >>is because if we fail to load checksum root, root's node is NULL which > >>cause NULL pointer deferences later. > >> > >>To fix this problem, we just did something like extent tree rebuilding. > >>Allocate a new one and clear uptodate flag. We will do sanity check > >>before fsck going on. > >I'm a bit worried about recommending --init-csum-root, though in this > >case there's not much else left to do. A filesystem with initialized > >csum tree will mount, but reading non-inline data will produce 'csum > >missing' errors. > Agree. Are you ok with removing the "rerun with --init-csum-tree option" part of the message? > >>--- a/cmds-check.c > >>+++ b/cmds-check.c > >>@@ -6963,6 +6963,11 @@ int cmd_check(int argc, char **argv) > >> ret = -EIO; > >> goto close_out; > >> } > >>+ if (!extent_buffer_uptodate(info->csum_root->node)) { > >>+ fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n"); > >>+ ret = -EIO; > >>+ goto close_out; > >So this should prevent segfaults due to missing csum tree, fine. The > >error message can copy what the broken extent tree reports a few lines > >above. > > > >And now that I'm looking at other extent_buffer_uptodate(tree) checks in > >the function, for clarity, each root check should be done separately and > >followed by a message that says which tree is broken. > Normally, extent_buffer_update(tree) is called after reading. > We need this in fsck is because we need reinit extent tree and csum tree. > > check it again is to make sure root node has been setup properly and > fsck can go further.. Yeah, I see how it works now, thanks. I've reorganized the patches in integration so the ones for fsck are grouped together. Fsck is scary and needs more reviews obviously, so the patches will be pushed towards release branches based on that. Reviews or tests so to say. I appreciate your work in that area and hope you understand the slow progress with your patches. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/04/2014 12:21 AM, David Sterba wrote: > On Tue, Jun 03, 2014 at 11:25:49AM +0800, Wang Shilong wrote: >> On 06/03/2014 01:27 AM, David Sterba wrote: >>> On Thu, May 29, 2014 at 05:59:57PM +0800, Wang Shilong wrote: >>>> If checksum root is corrupted, fsck will get segmentation. This >>>> is because if we fail to load checksum root, root's node is NULL which >>>> cause NULL pointer deferences later. >>>> >>>> To fix this problem, we just did something like extent tree rebuilding. >>>> Allocate a new one and clear uptodate flag. We will do sanity check >>>> before fsck going on. >>> I'm a bit worried about recommending --init-csum-root, though in this >>> case there's not much else left to do. A filesystem with initialized >>> csum tree will mount, but reading non-inline data will produce 'csum >>> missing' errors. >> Agree. > Are you ok with removing the "rerun with --init-csum-tree option" part > of the message? That's not good, i agree with your point here. > >>>> --- a/cmds-check.c >>>> +++ b/cmds-check.c >>>> @@ -6963,6 +6963,11 @@ int cmd_check(int argc, char **argv) >>>> ret = -EIO; >>>> goto close_out; >>>> } >>>> + if (!extent_buffer_uptodate(info->csum_root->node)) { >>>> + fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n"); >>>> + ret = -EIO; >>>> + goto close_out; >>> So this should prevent segfaults due to missing csum tree, fine. The >>> error message can copy what the broken extent tree reports a few lines >>> above. >>> >>> And now that I'm looking at other extent_buffer_uptodate(tree) checks in >>> the function, for clarity, each root check should be done separately and >>> followed by a message that says which tree is broken. >> Normally, extent_buffer_update(tree) is called after reading. >> We need this in fsck is because we need reinit extent tree and csum tree. >> >> check it again is to make sure root node has been setup properly and >> fsck can go further.. > Yeah, I see how it works now, thanks. > > I've reorganized the patches in integration so the ones for fsck are > grouped together. Fsck is scary and needs more reviews obviously, so the > patches will be pushed towards release branches based on that. Reviews > or tests so to say. I appreciate your work in that area and hope you > understand the slow progress with your patches. That's ok for me, thanks for your review and comments^_^ > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/cmds-check.c b/cmds-check.c index 0e4e042..ad5514e 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -6963,6 +6963,11 @@ int cmd_check(int argc, char **argv) ret = -EIO; goto close_out; } + if (!extent_buffer_uptodate(info->csum_root->node)) { + fprintf(stderr, "Checksum root corrupted, rerun with --init-csum-tree option\n"); + ret = -EIO; + goto close_out; + } fprintf(stderr, "checking extents\n"); ret = check_chunks_and_extents(root); diff --git a/disk-io.c b/disk-io.c index 63e153d..bbfd8e7 100644 --- a/disk-io.c +++ b/disk-io.c @@ -914,6 +914,13 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, u64 root_tree_bytenr, printk("Couldn't setup csum tree\n"); if (!(flags & OPEN_CTREE_PARTIAL)) return -EIO; + /* do the same thing as extent tree rebuilding */ + fs_info->csum_root->node = + btrfs_find_create_tree_block(fs_info->extent_root, 0, + leafsize); + if (!fs_info->csum_root->node) + return -ENOMEM; + clear_extent_buffer_uptodate(NULL, fs_info->csum_root->node); } fs_info->csum_root->track_dirty = 1;
If checksum root is corrupted, fsck will get segmentation. This is because if we fail to load checksum root, root's node is NULL which cause NULL pointer deferences later. To fix this problem, we just did something like extent tree rebuilding. Allocate a new one and clear uptodate flag. We will do sanity check before fsck going on. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> --- v1->v2: fix typo for output message. --- cmds-check.c | 5 +++++ disk-io.c | 7 +++++++ 2 files changed, 12 insertions(+)