diff mbox

parent transid verify failures on 2.6.39

Message ID 20110623195409.GA21007@dhcp231-156.rdu.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Josef Bacik June 23, 2011, 7:54 p.m. UTC
On Wed, Jun 22, 2011 at 09:45:20PM -0400, Chris Mason wrote:
> Excerpts from Andrej Podzimek's message of 2011-06-22 18:42:28 -0400:
> > 
> > Could I try your hack, pretty please? If there's any chance it could either resolve this problem
> > 
> >     http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg10683.html ,
> > 
> > or at least restore the data from the filesystem, then I'd like to give it a go. Waiting for the new btrfsck is currently not an option for me :-)
> 
> It looks like your box is failing to read the extent allocation tree.
> We don't allow the mount to proceed without that tree, but you don't
> actually need it for a readonly mount (to copy things off).
> 
> Josef, is your hack just a mount option to make -o readonly skip the
> extent allocation tree?
> 
> I can put this into my -o recovery patch and we can give it a try.
> 

Here's the patch, you _have_ to mount -o readonly.  Basically what it does is
search all the mirrors and finds the one with the newest generation number and
just uses that one, assuming that it will be the closest one to what we want.
This has worked relatively well for the people who have used it, so hopefully it
will work for you.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Daniel Witzel June 23, 2011, 9:11 p.m. UTC | #1
Well still no cigar, didnt even change the error output. Thank you though for 
at least trying to help. here goes the error info with your patch applied to
/fs/btrfs/disk-io.c:

mount -o ro /dev/sdf1 (same for c1,d1,etc) /btrfs (dmesg output)
[ 1647.330104] btrfs: open_ctree failed
[ 1683.328038] device label 1TB0 devid 1 transid 2135 /dev/sdf1
[ 1683.344059] parent transid verify failed on 2206281838592 wanted 2135 found 
1545
[ 1683.349109] btrfs: open_ctree failed

btrfsck -s 0 (and 1) /dev/sdf1(c1,d1,etc):
localhost btrfs-progs-unstable # ./btrfsck -s 1 /dev/sdc1
using SB copy 1, bytenr 67108864
failed to read /dev/sr0
failed to read /dev/sr0
parent transid verify failed on 2206281838592 wanted 2135 found 1545
btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root->node)' failed.
Aborted

(dmesg output)
[ 1825.231434] device label 1TB0 devid 1 transid 2135 /dev/sdf1
[ 1825.235292] device label 1TB0 devid 2 transid 2134 /dev/sde1
[ 1825.238560] device label 1TB0 devid 3 transid 2135 /dev/sdd1
[ 1825.241176] device label 1TB0 devid 4 transid 2135 /dev/sdc1
[ 1825.244681] device label 1TB0 devid 5 transid 2135 /dev/sdb1


gentoo baselayout 2, kernel 2.6.39-r1, btrfs-progs-unstable:cmason git master
branch, 5 disk usb raid-0 array



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik June 23, 2011, 9:35 p.m. UTC | #2
On 06/23/2011 05:11 PM, Daniel Witzel wrote:
> Well still no cigar, didnt even change the error output. Thank you though for
> at least trying to help. here goes the error info with your patch applied to
> /fs/btrfs/disk-io.c:
>
> mount -o ro /dev/sdf1 (same for c1,d1,etc) /btrfs (dmesg output)
> [ 1647.330104] btrfs: open_ctree failed
> [ 1683.328038] device label 1TB0 devid 1 transid 2135 /dev/sdf1
> [ 1683.344059] parent transid verify failed on 2206281838592 wanted 2135 found
> 1545
> [ 1683.349109] btrfs: open_ctree failed

You didn't apply it right then, because you shouldn't see these errors 
anymore.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrej Podzimek June 24, 2011, 1:30 a.m. UTC | #3
>>> Could I try your hack, pretty please? If there's any chance it could either resolve this problem
>>>
>>>      http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg10683.html ,
>>>
>>> or at least restore the data from the filesystem, then I'd like to give it a go. Waiting for the new btrfsck is currently not an option for me :-)
>>
>> It looks like your box is failing to read the extent allocation tree.
>> We don't allow the mount to proceed without that tree, but you don't
>> actually need it for a readonly mount (to copy things off).
>>
>> Josef, is your hack just a mount option to make -o readonly skip the
>> extent allocation tree?
>>
>> I can put this into my -o recovery patch and we can give it a try.
>>
>
> Here's the patch, you _have_ to mount -o readonly.  Basically what it does is
> search all the mirrors and finds the one with the newest generation number and
> just uses that one, assuming that it will be the closest one to what we want.
> This has worked relatively well for the people who have used it, so hopefully it
> will work for you.  Thanks,

Great, it works for me. I could mount the RAID0 root partition from an ArchLinux live CD (after patching, compiling and replacing the btrfs module first). Thank you very much!

My RAID1 /boot partition looks odd and mount froze (with an ooops) when I tried to access it this way. Fortunately, /boot doesn't matter that much, it will be easy to recover.

Andrej
Daniel Witzel June 24, 2011, 3:46 p.m. UTC | #4
well here is what I,m doing:

patch -p1 < disk-io.patch  
output: "patching file fs/btrfs/disk-io.c"
rmmod btrfs 
rmmod lzo_compress
make -j3
make -j3 modules
make -j3 modules_install
cp arch/x86_64/boot/bzImage /boot/linux-next
depmod -a

(reboot)
modprobe btrfs
btrfs device scan
btrfs filesystem show (all drives show)
mount -o ro /dev/sdb1 /btrfs

and the output is : 

[ 4364.813453] parent transid verify failed on 2206281838592 wanted 2135 found 
1545
[ 4364.817093] btrfs: open_ctree failed
 

I checked the resulting disk-io.c file and the changes were merged. as you can
see I rebuilt my kernel and modules, rebooted and still got this error. is there
a step I'm missing? 

thanks




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Witzel June 28, 2011, 3:46 p.m. UTC | #5
Earlier I tried the read only patch with no result. Josef said I must be 
applying it wrong because the error I get is not possible with the patch applied.
I tried again with no luck and posted my steps for review. Well here I am a few 
days later with the following questions:

1) If my steps are correct what else could be the problem
2) if my steps are wrong what do i need to do to get it right

Any help would be awesome

Thanks
Dan Witzel



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mitch Harder June 28, 2011, 4:44 p.m. UTC | #6
On Tue, Jun 28, 2011 at 10:46 AM, Daniel Witzel <dannyboy48888@gmail.com> wrote:
> Earlier I tried the read only patch with no result. Josef said I must be
> applying it wrong because the error I get is not possible with the patch applied.
> I tried again with no luck and posted my steps for review. Well here I am a few
> days later with the following questions:
>
> 1) If my steps are correct what else could be the problem
> 2) if my steps are wrong what do i need to do to get it right
>
> Any help would be awesome
>
> Thanks
> Dan Witzel
>

I just used this patch yesterday to help with a slightly different corruption.

I know the patch didn't apply cleanly for me, and I had to massage it.

You may want to manually audit disk-io.c to make sure the entire patch
is applied.

I know if I try to apply this patch to my 2.6.39.1 kernel, it fails.

# patch -p1 --dry-run <
/mnt/local/local/dontpanic/parent-transid-verify-failures-on-2.6.39.patch
patching file fs/btrfs/disk-io.c
Hunk #2 FAILED at 296.
Hunk #3 succeeded at 321 (offset -2 lines).
Hunk #4 succeeded at 331 (offset -2 lines).
Hunk #5 succeeded at 353 (offset -2 lines).
Hunk #6 succeeded at 1993 (offset -14 lines).
Hunk #7 succeeded at 2629 (offset 3 lines).
1 out of 7 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Witzel June 28, 2011, 5:04 p.m. UTC | #7
Thanks for the reply. copied the patch from the line "diff...." onward 
did a fresh kernel tree and got the following (same on 2.6.39-r1 and r2)

localhost linux# patch -p1 --dry-run --verbose < disk-io.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
|index c650a1d..53e330e 100644
|--- a/fs/btrfs/disk-io.c
|+++ b/fs/btrfs/disk-io.c
--------------------------
Patching file fs/btrfs/disk-io.c using Plan A...
Hunk #1 succeeded at 281.
Hunk #2 succeeded at 296.
Hunk #3 succeeded at 328.
Hunk #4 succeeded at 338.
Hunk #5 succeeded at 360.
Hunk #6 succeeded at 2012.
Hunk #7 succeeded at 2631.
done


A perfect patch job if I say so :)

any other ideas are welcome 

Dan Witzel



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Witzel June 28, 2011, 5:31 p.m. UTC | #8
Thanks for the reply, Copied the patch from the "diff" line onwards and patched
against a  fresh kernel 2.6.39-r1 and r2 tree with same result:

localhost linux # patch --dry-run --verbose -p1 < disk-io.patch 
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
|index c650a1d..53e330e 100644
|--- a/fs/btrfs/disk-io.c
|+++ b/fs/btrfs/disk-io.c
--------------------------
Patching file fs/btrfs/disk-io.c using Plan A...
Hunk #1 succeeded at 281.
Hunk #2 succeeded at 296.
Hunk #3 succeeded at 328.
Hunk #4 succeeded at 338.
Hunk #5 succeeded at 360.
Hunk #6 succeeded at 2012.
Hunk #7 succeeded at 2631.
done


same problem. Any other ideas would be great

Dan Witzel



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c650a1d..53e330e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -281,7 +281,7 @@  static int csum_tree_block(struct btrfs_root *root, struct extent_buffer *buf,
  * in the wrong place.
  */
 static int verify_parent_transid(struct extent_io_tree *io_tree,
-				 struct extent_buffer *eb, u64 parent_transid)
+				 struct extent_buffer *eb, u64 parent_transid, int uptodate)
 {
 	struct extent_state *cached_state = NULL;
 	int ret;
@@ -296,6 +296,11 @@  static int verify_parent_transid(struct extent_io_tree *io_tree,
 		ret = 0;
 		goto out;
 	}
+	if (!uptodate) {
+		ret = 0;
+		goto out;
+	}
+
 	if (printk_ratelimit()) {
 		printk("parent transid verify failed on %llu wanted %llu "
 		       "found %llu\n",
@@ -323,6 +328,9 @@  static int btree_read_extent_buffer_pages(struct btrfs_root *root,
 	int ret;
 	int num_copies = 0;
 	int mirror_num = 0;
+	int uptodate = 1;
+	int good_mirror = 0;
+	u64 generation = 0;
 
 	clear_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags);
 	io_tree = &BTRFS_I(root->fs_info->btree_inode)->io_tree;
@@ -330,9 +338,14 @@  static int btree_read_extent_buffer_pages(struct btrfs_root *root,
 		ret = read_extent_buffer_pages(io_tree, eb, start, 1,
 					       btree_get_extent, mirror_num);
 		if (!ret &&
-		    !verify_parent_transid(io_tree, eb, parent_transid))
+		    !verify_parent_transid(io_tree, eb, parent_transid, uptodate))
 			return ret;
 
+		if (btrfs_header_generation(eb) > generation) {
+			good_mirror = mirror_num;
+			generation = btrfs_header_generation(eb);
+		}
+
 		/*
 		 * This buffer's crc is fine, but its contents are corrupted, so
 		 * there is no reason to read the other copies, they won't be
@@ -347,8 +360,11 @@  static int btree_read_extent_buffer_pages(struct btrfs_root *root,
 			return ret;
 
 		mirror_num++;
-		if (mirror_num > num_copies)
-			return ret;
+		if (mirror_num > num_copies) {
+			mirror_num = good_mirror;
+			uptodate = 0;
+			continue;
+		}
 	}
 	return -EIO;
 }
@@ -1996,11 +2012,13 @@  struct btrfs_root *open_ctree(struct super_block *sb,
 		goto fail_block_groups;
 	}
 
+	/*
 	ret = btrfs_read_block_groups(extent_root);
 	if (ret) {
 		printk(KERN_ERR "Failed to read block groups: %d\n", ret);
 		goto fail_block_groups;
 	}
+	*/
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
 					       "btrfs-cleaner");
@@ -2613,12 +2631,7 @@  int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid)
 
 	ret = extent_buffer_uptodate(&BTRFS_I(btree_inode)->io_tree, buf,
 				     NULL);
-	if (!ret)
-		return ret;
-
-	ret = verify_parent_transid(&BTRFS_I(btree_inode)->io_tree, buf,
-				    parent_transid);
-	return !ret;
+	return ret;
 }
 
 int btrfs_set_buffer_uptodate(struct extent_buffer *buf)