diff mbox

btrfsck crashes

Message ID 333721177.250041.1341918535801.JavaMail.open-xchange@email.1und1.de (mailing list archive)
State New, archived
Headers show

Commit Message

haveaniceday@cv-sv.de July 10, 2012, 11:08 a.m. UTC
This code should detect the problem without SIGSEGV but a Assertition.
...
Csum didn't match
btrfsck: btrfsck.c:1177: walk_down_tree: Assertion `!(1)' failed.
Aborted
...


Next errors I get are:

....
checking fs roots
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
Csum didn't match
CVCV path->nodes[*level] is 0!
root 5 inode 265 errors 2000
        unresolved ref dir 2658782 index 3 namelen 12 name aquota.group filetype
0 error 3
        unresolved ref dir 2914579 index 3 namelen 12 name aquota.group filetype
0 error 3
root 5 inode 266 errors 2000
        unresolved ref dir 2658782 index 4 namelen 11 name aquota.user filetype
0 error 3
        unresolved ref dir 2914579 index 4 namelen 11 name aquota.user filetype
0 error 3
root 5 inode 285 errors 2000
        unresolved ref dir 2658783 index 3 namelen 3 name awk filetype 0 error 3
        unresolved ref dir 2914580 index 3 namelen 3 name awk filetype 0 error 3
root 5 inode 286 errors 2000
        unresolved ref dir 2658783 index 16 namelen 3 name csh filetype 0 error
3
        unresolved ref dir 2914580 index 16 namelen 3 name csh filetype 0 error
3
root 5 inode 287 errors 2000
        unresolved ref dir 2658783 index 27 namelen 13 name dnsdomainname
filetype 0 error 3
        unresolved ref dir 2914580 index 27 namelen 13 name dnsdomainname
filetype 0 error 3
root 5 inode 288 errors 2000
        unresolved ref dir 2658783 index 28 namelen 10 name domainname filetype
0 error 3
        unresolved ref dir 2914580 index 28 namelen 10 name domainname filetype
0 error 3
root 5 inode 289 errors 2000
        unresolved ref dir 2658783 index 34 namelen 2 name ex filetype 0 error 3
        unresolved ref dir 2914580 index 34 namelen 2 name ex filetype 0 error 3
root 5 inode 290 errors 2000
        unresolved ref dir 2658783 index 48 namelen 2 name ip filetype 0 error 3
        unresolved ref dir 2914580 index 48 namelen 2 name ip filetype 0 error 3
root 5 inode 291 errors 2000
        unresolved ref dir 2658783 index 54 namelen 3 name ksh filetype 0 error
3
        unresolved ref dir 2914580 index 54 namelen 3 name ksh filetype 0 error
3
root 5 inode 292 errors 2000
        unresolved ref dir 2658783 index 63 namelen 4 name mail filetype 0 error
3
        unresolved ref dir 2914580 index 63 namelen 4 name mail filetype 0 error
3
...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Anand Jain July 11, 2012, 7:13 a.m. UTC | #1
If this is a deliberate corruption can you pls share the test-case ?
  if not have you tried mount with recovery and the scrub. ? scrub
  would be preferred choice over btrfsck.



On 10/07/12 19:08, haveaniceday@cv-sv.de wrote:
> This code should detect the problem without SIGSEGV but a Assertition.
> ...
> Csum didn't match
> btrfsck: btrfsck.c:1177: walk_down_tree: Assertion `!(1)' failed.
> Aborted
> ...
>
>
> --- btrfsck.c   2012-07-10 10:23:24.781622144 +0200
> +++ btrfsck.c   2012-07-10 12:59:00.120146266 +0200
> @@ -1173,7 +1173,7 @@
>                  WARN_ON(*level>= BTRFS_MAX_LEVEL);
>                  cur = path->nodes[*level];
>
> -               if (btrfs_header_level(cur) != *level)
> +               if (! cur || btrfs_header_level(cur) != *level)
>                          WARN_ON(1);
>
>                  if (path->slots[*level]>= btrfs_header_nritems(cur))
>
>   I tried to skip this error with the code below. The next errors reported are
> also below.
>
>
> --- btrfsck.c   2012-07-10 10:23:24.781622144 +0200
> +++ btrfsck.c   2012-07-10 12:36:51.995996771 +0200
> @@ -1173,8 +1173,13 @@
>                  WARN_ON(*level>= BTRFS_MAX_LEVEL);
>                  cur = path->nodes[*level];
>
> -               if (btrfs_header_level(cur) != *level)
> -                       WARN_ON(1);
> +               if (cur != 0 ) {
> +                       if ( btrfs_header_level(cur) != *level)
> +                               WARN_ON(1);
> +               }else {
> +                       fprintf(stderr, "CVCV path->nodes[*level] is 0!\n");
> +                       break;
> +               }
>
>                  if (path->slots[*level]>= btrfs_header_nritems(cur))
>                          break;
> @@ -1213,7 +1218,11 @@
>                  path->slots[*level] = 0;
>          }
>   out:
> +       if ( path->nodes[*level] != 0 ){
>          path->slots[*level] = btrfs_header_nritems(path->nodes[*level]);
> +       } else {
> +       path->slots[*level] = 0;
> +       }
>          return 0;
>   }
>
> Next errors I get are:
>
> ....
> checking fs roots
> checksum verify failed on 2327654400 wanted 73CDE79C found 72
> checksum verify failed on 2327654400 wanted 73CDE79C found 72
> checksum verify failed on 2327654400 wanted 73CDE79C found 72
> checksum verify failed on 2327654400 wanted 73CDE79C found 72
> Csum didn't match
> CVCV path->nodes[*level] is 0!
> root 5 inode 265 errors 2000
>          unresolved ref dir 2658782 index 3 namelen 12 name aquota.group filetype
> 0 error 3
>          unresolved ref dir 2914579 index 3 namelen 12 name aquota.group filetype
> 0 error 3
> root 5 inode 266 errors 2000
>          unresolved ref dir 2658782 index 4 namelen 11 name aquota.user filetype
> 0 error 3
>          unresolved ref dir 2914579 index 4 namelen 11 name aquota.user filetype
> 0 error 3
> root 5 inode 285 errors 2000
>          unresolved ref dir 2658783 index 3 namelen 3 name awk filetype 0 error 3
>          unresolved ref dir 2914580 index 3 namelen 3 name awk filetype 0 error 3
> root 5 inode 286 errors 2000
>          unresolved ref dir 2658783 index 16 namelen 3 name csh filetype 0 error
> 3
>          unresolved ref dir 2914580 index 16 namelen 3 name csh filetype 0 error
> 3
> root 5 inode 287 errors 2000
>          unresolved ref dir 2658783 index 27 namelen 13 name dnsdomainname
> filetype 0 error 3
>          unresolved ref dir 2914580 index 27 namelen 13 name dnsdomainname
> filetype 0 error 3
> root 5 inode 288 errors 2000
>          unresolved ref dir 2658783 index 28 namelen 10 name domainname filetype
> 0 error 3
>          unresolved ref dir 2914580 index 28 namelen 10 name domainname filetype
> 0 error 3
> root 5 inode 289 errors 2000
>          unresolved ref dir 2658783 index 34 namelen 2 name ex filetype 0 error 3
>          unresolved ref dir 2914580 index 34 namelen 2 name ex filetype 0 error 3
> root 5 inode 290 errors 2000
>          unresolved ref dir 2658783 index 48 namelen 2 name ip filetype 0 error 3
>          unresolved ref dir 2914580 index 48 namelen 2 name ip filetype 0 error 3
> root 5 inode 291 errors 2000
>          unresolved ref dir 2658783 index 54 namelen 3 name ksh filetype 0 error
> 3
>          unresolved ref dir 2914580 index 54 namelen 3 name ksh filetype 0 error
> 3
> root 5 inode 292 errors 2000
>          unresolved ref dir 2658783 index 63 namelen 4 name mail filetype 0 error
> 3
>          unresolved ref dir 2914580 index 63 namelen 4 name mail filetype 0 error
> 3
> ...
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
haveaniceday@cv-sv.de July 11, 2012, 8:36 a.m. UTC | #2
Anand Jain <Anand.Jain@oracle.com> hat am 11. Juli 2012 um 09:13 geschrieben:

>
>
>   If this is a deliberate corruption can you pls share the test-case ?
No. It's a real life corruption on a file system used to back up some servers.

That's also why basics like aquota,awk etc. are found.

But I expect it would be very hard to make a reproducible test case with error.
(Usage: see  PS: below.)

>   if not have you tried mount with recovery and the scrub. ? scrub>   would be
> preferred choice over btrfsck.
I can scrub this file system.
But isn't it a good test to try some recovery? A stable btrfs later should
manage  corruptions like this SIGSEGV and data loss.
I expect a real life recover could cover more strange things than the test cases
:)

So it's your/ btrfs supporters choice. How far we should follow this issue.
I did in between an image of the corrupted file system, so multiple recovery
tries are possible.

Best regards,
Christian

PS: I would bet that my kind of usage is a very good stress test for btrfs.

- large file system "/backup" btrfs with compress enabled.

Content of the file system:
- ./server1 .... /server5  as directories
- for each server the directory has a structure like this:
  backup-YYYY-DD-MM-HH:M
  New backups are created with:
  rsync -axvH --link-dest=/backup/server'n'/backup-...(old, last dir)..
  server:/   /backup/server'n'/backup-YYY-.../.

This generates files with a large number of hard links.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
haveaniceday@cv-sv.de July 12, 2012, 7:08 p.m. UTC | #3
Anand Jain schrieb:
>
>
>   If this is a deliberate corruption can you pls share the test-case ?
>   if not have you tried mount with recovery and the scrub. ? scrub
>   would be preferred choice over btrfsck.
>
>
>

Scrub does not fix the problem. I replaced the real host name with "myhost".
Strange for me: the mentioned pathes for errors point to the same file names,
just a part of the "myhost" is different.

The btrfsck fails with the same crash after the scrub.


speedy:/home/cv # btrfs scrub status /backup.old
scrub status for fa7034c8-86d4-4aa3-9fde-ecd7051ff43c
         scrub started at Thu Jul 12 20:21:08 2012 and finished after 1495 seconds
         total bytes scrubbed: 115.49GiB with 9 errors
         error details: verify=3 csum=6
         corrected errors: 3, uncorrectable errors: 6, unverified errors: 0

Should I continue with any analysis for bug hunting or just reformat and forget?

Best regard,
Christian

[ 5059.168649] btrfs: checksum/header error at logical 1532956672 on dev /dev/md3, sector 7204744: metadata leaf (level 0) in tree 2
[ 5059.168656] btrfs: checksum/header error at logical 1532956672 on dev /dev/md3, sector 7204744: metadata leaf (level 0) in tree 2
[ 5065.581348] btrfs: fixed up error at logical 1532956672 on dev /dev/md3
[ 5065.587844] btrfs: checksum/header error at logical 1532960768 on dev /dev/md3, sector 7204752: metadata leaf (level 0) in tree 2
[ 5065.587851] btrfs: checksum/header error at logical 1532960768 on dev /dev/md3, sector 7204752: metadata leaf (level 0) in tree 2
[ 5065.599317] btrfs: fixed up error at logical 1532960768 on dev /dev/md3
[ 5065.599500] btrfs: checksum/header error at logical 1532964864 on dev /dev/md3, sector 7204760: metadata leaf (level 0) in tree 2
[ 5065.599506] btrfs: checksum/header error at logical 1532964864 on dev /dev/md3, sector 7204760: metadata leaf (level 0) in tree 2
[ 5065.607379] btrfs: fixed up error at logical 1532964864 on dev /dev/md3
[ 5074.964900] btrfs: checksum error at logical 2327654400 on dev /dev/md3, sector 8756888: metadata leaf (level 0) in tree 5
[ 5074.964907] btrfs: checksum error at logical 2327654400 on dev /dev/md3, sector 8756888: metadata leaf (level 0) in tree 5
[ 5075.977763] btrfs: unable to fixup (regular) error at logical 2327654400 on dev /dev/md3
[ 5085.133646] btrfs: checksum error at logical 2327654400 on dev /dev/md3, sector 10854040: metadata leaf (level 0) in tree 5
[ 5085.133653] btrfs: checksum error at logical 2327654400 on dev /dev/md3, sector 10854040: metadata leaf (level 0) in tree 5
[ 5086.148842] btrfs: unable to fixup (regular) error at logical 2327654400 on dev /dev/md3
[ 6436.036292] btrfs: checksum error at logical 139801403392 on dev /dev/md3, sector 331786256, root 5, inode 2960268, offset 1345069056, length 4096, links 1 (path: int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/"myhost".de/statistics/logs/access_ssl_log.processed)
[ 6436.036300] btrfs: unable to fixup (regular) error at logical 139801403392 on dev /dev/md3
[ 6454.615722] btrfs: checksum error at logical 141661282304 on dev /dev/md3, sector 335418832, root 5, inode 2968078, offset 104292352, length 4096, links 1 (path: int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/"myhost".no/statistics/logs/error_log)
[ 6454.615736] btrfs: unable to fixup (regular) error at logical 141661282304 on dev /dev/md3
[ 6455.523759] btrfs: checksum error at logical 140794101760 on dev /dev/md3, sector 333725120, root 5, inode 2964438, offset 87449600, length 4096, links 1 (path: int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/"myhost".fr/statistics/logs/access_log.processed)
[ 6455.523775] btrfs: unable to fixup (regular) error at logical 140794101760 on dev /dev/md3
[ 6475.865387] btrfs: checksum error at logical 143052115968 on dev /dev/md3, sector 338135304, root 5, inode 3000621, offset 1078595584, length 4096, links 1 (path: int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/"otherhost".com/statistics/logs/access_log.processed)
[ 6475.865403] btrfs: unable to fixup (regular) error at logical 143052115968 on dev /dev/md3

speedy:/tmp/btrfs/btrfs-progs # ./btrfsck /dev/md3
checking extents
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
Csum didn't match
owner ref check failed [2327654400 4096]
ref mismatch on [101138354176 98304] extent item 1, found 0
Incorrect local backref count on 101138354176 root 5 owner 1867898 offset 0 found 0 wanted 1 back 0x787c260
backpointer mismatch on [101138354176 98304]
owner ref check failed [101138354176 98304]
ref mismatch on [101138452480 106496] extent item 1, found 0
Incorrect local backref count on 101138452480 root 5 owner 1867899 offset 0 found 0 wanted 1 back 0x787c2a0
backpointer mismatch on [101138452480 106496]
owner ref check failed [101138452480 106496]
ref mismatch on [101138558976 8192] extent item 1, found 0
Incorrect local backref count on 101138558976 root 5 owner 1867901 offset 0 found 0 wanted 1 back 0x2d2a700
backpointer mismatch on [101138558976 8192]
owner ref check failed [101138558976 8192]
ref mismatch on [101138567168 16384] extent item 1, found 0
Incorrect local backref count on 101138567168 root 5 owner 1867902 offset 0 found 0 wanted 1 back 0x2d2a740
backpointer mismatch on [101138567168 16384]
owner ref check failed [101138567168 16384]
ref mismatch on [101138583552 16384] extent item 1, found 0
Incorrect local backref count on 101138583552 root 5 owner 1867903 offset 0 found 0 wanted 1 back 0x2d2a780
backpointer mismatch on [101138583552 16384]
owner ref check failed [101138583552 16384]
Errors found in extent allocation tree
checking fs roots
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
Csum didn't match
btrfsck: btrfsck.c:1177: walk_down_tree: Assertion `!(1)' failed.
Abgebrochen
s
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin Steigerwald July 15, 2012, 2:05 p.m. UTC | #4
Am Mittwoch, 11. Juli 2012 schrieb haveaniceday@cv-sv.de:
> PS: I would bet that my kind of usage is a very good stress test for
> btrfs.
> 
> - large file system "/backup" btrfs with compress enabled.
> 
> Content of the file system:
> - ./server1 .... /server5  as directories
> - for each server the directory has a structure like this:
>   backup-YYYY-DD-MM-HH:M
>   New backups are created with:
>   rsync -axvH --link-dest=/backup/server'n'/backup-...(old, last dir)..
>   server:/   /backup/server'n'/backup-YYY-.../.
> 
> This generates files with a large number of hard links.

I am using

btrfs subvolume snapshot -r

after a rsync backup for exact this case.

Ciao,
diff mbox

Patch

--- btrfsck.c   2012-07-10 10:23:24.781622144 +0200
+++ btrfsck.c   2012-07-10 12:59:00.120146266 +0200
@@ -1173,7 +1173,7 @@ 
                WARN_ON(*level >= BTRFS_MAX_LEVEL);
                cur = path->nodes[*level];

-               if (btrfs_header_level(cur) != *level)
+               if (! cur || btrfs_header_level(cur) != *level)
                        WARN_ON(1);

                if (path->slots[*level] >= btrfs_header_nritems(cur))

 I tried to skip this error with the code below. The next errors reported are
also below.


--- btrfsck.c   2012-07-10 10:23:24.781622144 +0200
+++ btrfsck.c   2012-07-10 12:36:51.995996771 +0200
@@ -1173,8 +1173,13 @@ 
                WARN_ON(*level >= BTRFS_MAX_LEVEL);
                cur = path->nodes[*level];

-               if (btrfs_header_level(cur) != *level)
-                       WARN_ON(1);
+               if (cur != 0 ) {
+                       if ( btrfs_header_level(cur) != *level)
+                               WARN_ON(1);
+               }else {
+                       fprintf(stderr, "CVCV path->nodes[*level] is 0!\n");
+                       break;
+               }

                if (path->slots[*level] >= btrfs_header_nritems(cur))
                        break;
@@ -1213,7 +1218,11 @@ 
                path->slots[*level] = 0;
        }
 out:
+       if ( path->nodes[*level] != 0 ){
        path->slots[*level] = btrfs_header_nritems(path->nodes[*level]);
+       } else {
+       path->slots[*level] = 0;
+       }
        return 0;
 }