Message ID | 4FF6FF8C.4010007@jan-o-sch.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jul 06, 2012 at 10:59:24PM +0300, Sami Liedes wrote: > I think I might try running it overnight with KMEMCHECK to see if it > reports something. But for now, what there's in the log: My KMEMCHECK kernel didn't even boot (due to some weird KMEMCHECK/ACPI interaction), so I won't pursue this idea further at the moment... > * lots of checksum mismatch [234], no 1s One thing to notice from the logs, too, is that the device seems to always be dm-6, the second device of the filesystem. This never seems to happen to dm-5. There are 1583 lines of "btrfs: dm-6 checksum verify failed". Sami
[Retry: I think this mail didn't make it to the list, probably because of the 73 kilobyte attached log. Here's a URL to the file:] http://www.niksula.hut.fi/~sliedes/btrfs-scrub-debug.log.gz Sami ------------------------------------------------------------ On Fri, Jul 06, 2012 at 05:09:00PM +0200, Jan Schmidt wrote: > Oh I see. root->node can be NULL during mount. Please add this on top: Ok. So, ran it with DEBUG_PAGEALLOC and slub debugging on. This time it took half an hour to crash, and there's _lots_ of checksum mismatch [234] messages even before the crash. gzipped dmesg attached. At 781 seconds there's an "irq 17: nobody cared". That's a known bug with this (and other Asus) motherboards and happens every now and then. I doubt it has anything to do with this. I think I might try running it overnight with KMEMCHECK to see if it reports something. But for now, what there's in the log: * lots of checksum mismatch [234], no 1s * a fair number of "csum_tree_block: [0-9]+ callbacks suppressed" lines * two "btrfs: node seems invalid now. checksum ok = 1" messages, one at 1499 seconds and another just before the crash at 1973 * Just before the crash: btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2261163409408, level 100, generation 4412718571037421157, nritems 538968254. len param 17. debug 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0 * the oopses > > By the way, something seems to be untabifying your patches. I don't > > know if it's on my side or yours, but at least some other patches I > > receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs > > mostly makes them apply cleanly for me. > > Oh, I'm sorry. Should have been on my side. I hope it's better with the current > diff? Yes. No problem :) [See attachment for dmesg log.] Sami -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07.07.2012 01:44, Sami Liedes wrote: > [Retry: I think this mail didn't make it to the list, probably because > of the 73 kilobyte attached log. Here's a URL to the file:] > > http://www.niksula.hut.fi/~sliedes/btrfs-scrub-debug.log.gz > > Sami > > > ------------------------------------------------------------ > On Fri, Jul 06, 2012 at 05:09:00PM +0200, Jan Schmidt wrote: >> Oh I see. root->node can be NULL during mount. Please add this on top: > > Ok. So, ran it with DEBUG_PAGEALLOC and slub debugging on. This time > it took half an hour to crash, and there's _lots_ of checksum mismatch > [234] messages even before the crash. gzipped dmesg attached. > > At 781 seconds there's an "irq 17: nobody cared". That's a known bug > with this (and other Asus) motherboards and happens every now and > then. I doubt it has anything to do with this. > > I think I might try running it overnight with KMEMCHECK to see if it > reports something. But for now, what there's in the log: > > * lots of checksum mismatch [234], no 1s > > * a fair number of "csum_tree_block: [0-9]+ callbacks suppressed" > lines > > * two "btrfs: node seems invalid now. checksum ok = 1" messages, one > at 1499 seconds and another just before the crash at 1973 > > * Just before the crash: > btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2261163409408, level 100, generation 4412718571037421157, nritems 538968254. len param 17. debug 2/989/538968254/4412718571037421157/0x0/0/0x0/0x0 > At a first glance: the generation converted to ascii is: "ent() ==", so someone is patching the memory with ascii text, possibly C source. It might be interesting to dump the full contents of the eb, to get a clue on the source of the data. > * the oopses > >>> By the way, something seems to be untabifying your patches. I don't >>> know if it's on my side or yours, but at least some other patches I >>> receive via linux-btrfs contain tabs. Doing a M-x tabify in emacs >>> mostly makes them apply cleanly for me. >> >> Oh, I'm sorry. Should have been on my side. I hope it's better with the current >> diff? > > Yes. No problem :) > > [See attachment for dmesg log.] > > Sami > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index df0b347..22838a3 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -578,7 +578,8 @@ static noinline int check_node(struct btrfs_root *root, } node->debug[5] = node->start; node->debug[6] = btrfs_header_level(node); - node->debug[6] |= btrfs_header_level(root->node) << 16; + if (root->node) + node->debug[6] |= btrfs_header_level(root->node) << 16; node->debug[7] = 0xb22f50b22f5; return 0;