diff mbox

[V3,2/2] Btrfs: fix the snapshot that should not exist

Message ID 501A04CE.9090408@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Miao Xie Aug. 2, 2012, 4:40 a.m. UTC
The snapshot should be the image of the fs tree before it was created,
so the metadata of the snapshot should not exist in the its tree. But now, we
found the directory item and directory name index is in both the snapshot tree
and the fs tree. It introduces some problems and makes the users feel strange:

 # mkfs.btrfs /dev/sda1
 # mount /dev/sda1 /mnt
 # mkdir /mnt/1
 # cd /mnt/1
 # btrfs subvolume snapshot /mnt snap0
 # ls -a /mnt/1/snap0/1
 .	..	[no other file/dir]

 # ll /mnt/1/snap0/
 total 0
 drwxr-xr-x 1 root root 10 Ju1 24 12:11 1
			^^^
			There is no file/dir in it, but it's size is 10

 # cd /mnt/1/snap0/1/snap0
 [Enter a unexisted directory successfully...]

There is nothing in the directory 1 in snap0, but btrfs told the length of
this directory is 10. Beside that, we can enter an unexisted directory, it is
very strange to the users.

 # btrfs subvolume snapshot /mnt/1/snap0 /mnt/snap1
 # ll /mnt/1/snap0/1/
 total 0
 [None]
 # ll /mnt/snap1/1/
 total 0
 drwxr-xr-x 1 root root 0 Ju1 24 12:14 snap0

And the source of snap1 did have any directory in Directory 1, but snap1 have
a snap0, it is different between the source and the snapshot.

So I think we should insert directory item and directory name index and update
the parent inode as the last step of snapshot creation, and do not leave the
useless metadata in the file tree.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v2 -> v3:
- rebase on the latest for-linus branch

Changelog v1 -> v2:
- add comment to explain why we need deal with the delayed items after
  snapshot creation and why this operation do not corrupt the metadata.
- move dput() to patch 1/2
---
 fs/btrfs/transaction.c |   66 +++++++++++++++++++++++++++++++++++++----------
 1 files changed, 52 insertions(+), 14 deletions(-)

Comments

David Sterba Aug. 2, 2012, 11:46 a.m. UTC | #1
Hi,

appologies for late reply,

On Thu, Aug 02, 2012 at 12:40:46PM +0800, Miao Xie wrote:
> Changelog v1 -> v2:
> - add comment to explain why we need deal with the delayed items after
>   snapshot creation and why this operation do not corrupt the metadata.

I'm sorry, the comment did not fix the bug :)

The subvol stress is able to hit this:

[ 2360.444321] ------------[ cut here ]------------
[ 2360.448019] kernel BUG at fs/btrfs/extent-tree.c:6047!
[ 2360.448019] invalid opcode: 0000 [#1] SMP
[ 2360.448019] CPU 0
[ 2360.448019] Modules linked in: btrfs aoe [last unloaded: btrfs]
[ 2360.448019]
[ 2360.448019] Pid: 8212, comm: btrfs Not tainted 3.5.0-default+ #170 Intel Corporation Santa Rosa platform/Matanzas
[ 2360.448019] RIP: 0010:[<ffffffffa00f62a1>]  [<ffffffffa00f62a1>] run_clustered_refs+0xa11/0xa20 [btrfs]
[ 2360.448019] RSP: 0018:ffff88003eca1a68  EFLAGS: 00010246
[ 2360.448019] RAX: 00000000000007ff RBX: ffff880017a694c8 RCX: ffff88003eca1a08
[ 2360.448019] RDX: ffff880028aa9000 RSI: 00000000000007fe RDI: ffff880064223cf0
[ 2360.448019] RBP: ffff88003eca1b48 R08: 00000000000007ff R09: ffff88003eca19f8
[ 2360.448019] R10: ffff88002435d1e8 R11: 0000000000000000 R12: ffff880025d66d28
[ 2360.448019] R13: ffff880038640000 R14: ffff8800778dfa88 R15: ffff880060f010d0
[ 2360.448019] FS:  00007f3289f35740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
[ 2360.448019] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2360.448019] CR2: ffffffffff600400 CR3: 000000002e112000 CR4: 00000000000007f0
[ 2360.448019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2360.448019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2360.448019] Process btrfs (pid: 8212, threadinfo ffff88003eca0000, task ffff88001d834200)
[ 2360.448019] Stack:
[ 2360.448019]  0000000000000000 0000000000000000 0000000000000001 0000000000000000
[ 2360.448019]  00000000000007ed ffff88002435d1e8 000000003eca1b18 0000000000000000
[ 2360.448019]  0000000000000770 0000000000000000 000000005cb1e000 ffff88003eca1c08
[ 2360.448019] Call Trace:
[ 2360.448019]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
[ 2360.448019]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
[ 2360.448019]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
[ 2360.448019]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
[ 2360.448019]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
[ 2360.448019]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
[ 2360.448019]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
[ 2360.448019]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
[ 2360.448019]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
[ 2360.448019]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
[ 2360.448019]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
[ 2360.448019]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
[ 2360.448019]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
[ 2360.448019]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
[ 2360.448019]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
[ 2360.448019]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
[ 2360.448019]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
[ 2360.448019]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
[ 2360.448019]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
[ 2360.448019]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
[ 2360.448019] Code: 8b 76 40 48 89 d7 48 89 55 a0 e8 2b 74 ff ff 83 f8 17 0f 87 1e ff ff ff 0f 0b 80 fa b2 0f 84 b4 f8 ff ff 0f 0b 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57
[ 2360.448019] RIP  [<ffffffffa00f62a1>] run_clustered_refs+0xa11/0xa20 [btrfs]
[ 2360.448019]  RSP <ffff88003eca1a68>
[ 2360.814508] ---[ end trace 555a16cac3620ccb ]---
[ 2360.820398] note: btrfs[8212] exited with preempt_count 1
[ 2360.827072] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
[ 2360.836047] in_atomic(): 1, irqs_disabled(): 0, pid: 8212, name: btrfs
[ 2360.843859] INFO: lockdep is turned off.
[ 2360.849021] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ #170
[ 2360.849022] Call Trace:
[ 2360.849027]  [<ffffffff8107a40c>] __might_sleep+0xfc/0x130
[ 2360.849030]  [<ffffffff818ea0f6>] down_read+0x26/0xa0
[ 2360.849034]  [<ffffffff810b416b>] acct_collect+0x4b/0x1b0
[ 2360.849038]  [<ffffffff8104c838>] do_exit+0x718/0x9a0
[ 2360.849041]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
[ 2360.849043]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
[ 2360.849046]  [<ffffffff81005a7b>] die+0x5b/0x90
[ 2360.849048]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
[ 2360.849052]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
[ 2360.849067]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
[ 2360.849071]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 2360.849073]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
[ 2360.849076]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
[ 2360.849087]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
[ 2360.849097]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
[ 2360.849108]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
[ 2360.849110]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
[ 2360.849119]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
[ 2360.849133]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
[ 2360.849145]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
[ 2360.849148]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
[ 2360.849161]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
[ 2360.849164]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
[ 2360.849166]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
[ 2360.849180]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
[ 2360.849194]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
[ 2360.849207]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
[ 2360.849221]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
[ 2360.849224]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
[ 2360.849226]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
[ 2360.849229]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
[ 2360.849231]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
[ 2360.849234]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
[ 2360.849236]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
[ 2360.849239]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
[ 2360.849255] BUG: scheduling while atomic: btrfs/8212/0x10000002
[ 2360.849256] INFO: lockdep is turned off.
[ 2360.849257] Modules linked in: btrfs aoe [last unloaded: btrfs]
[ 2360.849261] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ #170
[ 2360.849262] Call Trace:
[ 2360.849262]  [<ffffffff81078318>] __schedule_bug+0x68/0x90
[ 2360.849265]  [<ffffffff818eafcc>] __schedule+0x73c/0x810
[ 2360.849268]  [<ffffffff8107b48a>] __cond_resched+0x2a/0x40
[ 2360.849270]  [<ffffffff818eb121>] _cond_resched+0x31/0x40
[ 2360.849273]  [<ffffffff81128e13>] unmap_single_vma+0x493/0x750
[ 2360.849276]  [<ffffffff811100b0>] ? lru_deactivate_fn+0x1e0/0x1e0
[ 2360.849279]  [<ffffffff810a4be0>] ? trace_hardirqs_on_caller+0x20/0x1d0
[ 2360.849281]  [<ffffffff8112986c>] unmap_vmas+0x3c/0x60
[ 2360.849284]  [<ffffffff81130de1>] exit_mmap+0x81/0x140
[ 2360.849287]  [<ffffffff81043824>] mmput+0x74/0x130
[ 2360.849289]  [<ffffffff8104a520>] exit_mm+0x100/0x120
[ 2360.849292]  [<ffffffff8104c858>] do_exit+0x738/0x9a0
[ 2360.849294]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
[ 2360.849297]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
[ 2360.849299]  [<ffffffff81005a7b>] die+0x5b/0x90
[ 2360.849301]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
[ 2360.849304]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
[ 2360.849307]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
[ 2360.849317]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 2360.849320]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
[ 2360.849322]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
[ 2360.849325]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
[ 2360.849335]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
[ 2360.849346]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
[ 2360.849356]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
[ 2360.849358]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
[ 2360.849367]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
[ 2360.849380]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
[ 2360.849393]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
[ 2360.849395]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
[ 2360.849409]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
[ 2360.849411]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
[ 2360.849413]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
[ 2360.849427]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
[ 2360.849441]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
[ 2360.849455]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
[ 2360.849469]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
[ 2360.849471]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
[ 2360.849473]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
[ 2360.849476]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
[ 2360.849478]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
[ 2360.849481]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
[ 2360.849483]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b

fs/btrfs/extent-tree.c:6047

6046         if (parent > 0) {
6047                 BUG_ON(!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF));
6048                 btrfs_set_extent_inline_ref_type(leaf, iref,
6049                                                  BTRFS_SHARED_BLOCK_REF_KEY);
6050                 btrfs_set_extent_inline_ref_offset(leaf, iref, parent);
6051         } else {
6052                 btrfs_set_extent_inline_ref_type(leaf, iref,
6053                                                  BTRFS_TREE_BLOCK_REF_KEY);
6054                 btrfs_set_extent_inline_ref_offset(leaf, iref, root_objectid);
6055         }


Currently for-linux hangs early during the test, so I applied V3 patches on top
of 3.5.

The filesystem is freshly created, the load is to simultaneously unpack large tar,
snapshot the fs, delete random snapshot, looped rm of the untarred dir. Crashes after
some minutes, reliably.

Fsck spits lots of errors:

ref mismatch on [1133031424 4096] extent item 1, found 0
Backref 1133031424 root 5 not referenced back 0x7d1f40
Incorrect global backref count on 1133031424 found 1 wanted 0
backpointer mismatch on [1133031424 4096]
owner ref check failed [1133031424 4096]

ref mismatch on [11213131776 16384] extent item 1, found 0
Incorrect local backref count on 11213131776 root 5 owner 34509 offset 0 found 0 wanted 1 back 0x1424d8e0
backpointer mismatch on [11213131776 16384]
owner ref check failed [11213131776 16384]

fs tree 260 refs 6 not found
        unresolved ref root 263 dir 256 index 4 namelen 14 name snap2748615355 error 600
        unresolved ref root 267 dir 256 index 4 namelen 14 name snap2748615355 error 600
        unresolved ref root 269 dir 256 index 4 namelen 14 name snap2748615355 error 600
        unresolved ref root 273 dir 256 index 4 namelen 14 name snap2748615355 error 600
        unresolved ref root 274 dir 256 index 4 namelen 14 name snap2748615355 error 600
        unresolved ref root 276 dir 256 index 4 namelen 14 name snap2748615355 error 600


I've asked Josef to pull those patches out of btrfs-next, feel free to send me any testing
version if you can't reproduce it on your side.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miao Xie Aug. 3, 2012, 1:53 a.m. UTC | #2
On Thu, 2 Aug 2012 13:46:31 +0200, David Sterba wrote:
> Hi,
> 
> appologies for late reply,
> 
> On Thu, Aug 02, 2012 at 12:40:46PM +0800, Miao Xie wrote:
>> Changelog v1 -> v2:
>> - add comment to explain why we need deal with the delayed items after
>>   snapshot creation and why this operation do not corrupt the metadata.
> 
> I'm sorry, the comment did not fix the bug :)
> 
> The subvol stress is able to hit this:
> 
> [ 2360.444321] ------------[ cut here ]------------
> [ 2360.448019] kernel BUG at fs/btrfs/extent-tree.c:6047!
> [ 2360.448019] invalid opcode: 0000 [#1] SMP
> [ 2360.448019] CPU 0
> [ 2360.448019] Modules linked in: btrfs aoe [last unloaded: btrfs]
> [ 2360.448019]
> [ 2360.448019] Pid: 8212, comm: btrfs Not tainted 3.5.0-default+ #170 Intel Corporation Santa Rosa platform/Matanzas
> [ 2360.448019] RIP: 0010:[<ffffffffa00f62a1>]  [<ffffffffa00f62a1>] run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.448019] RSP: 0018:ffff88003eca1a68  EFLAGS: 00010246
> [ 2360.448019] RAX: 00000000000007ff RBX: ffff880017a694c8 RCX: ffff88003eca1a08
> [ 2360.448019] RDX: ffff880028aa9000 RSI: 00000000000007fe RDI: ffff880064223cf0
> [ 2360.448019] RBP: ffff88003eca1b48 R08: 00000000000007ff R09: ffff88003eca19f8
> [ 2360.448019] R10: ffff88002435d1e8 R11: 0000000000000000 R12: ffff880025d66d28
> [ 2360.448019] R13: ffff880038640000 R14: ffff8800778dfa88 R15: ffff880060f010d0
> [ 2360.448019] FS:  00007f3289f35740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
> [ 2360.448019] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 2360.448019] CR2: ffffffffff600400 CR3: 000000002e112000 CR4: 00000000000007f0
> [ 2360.448019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2360.448019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 2360.448019] Process btrfs (pid: 8212, threadinfo ffff88003eca0000, task ffff88001d834200)
> [ 2360.448019] Stack:
> [ 2360.448019]  0000000000000000 0000000000000000 0000000000000001 0000000000000000
> [ 2360.448019]  00000000000007ed ffff88002435d1e8 000000003eca1b18 0000000000000000
> [ 2360.448019]  0000000000000770 0000000000000000 000000005cb1e000 ffff88003eca1c08
> [ 2360.448019] Call Trace:
> [ 2360.448019]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
> [ 2360.448019]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
> [ 2360.448019]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
> [ 2360.448019]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
> [ 2360.448019]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
> [ 2360.448019]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
> [ 2360.448019]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
> [ 2360.448019]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
> [ 2360.448019]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
> [ 2360.448019]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
> [ 2360.448019]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
> [ 2360.448019]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
> [ 2360.448019]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
> [ 2360.448019]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
> [ 2360.448019]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
> [ 2360.448019]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
> [ 2360.448019]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
> [ 2360.448019]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
> [ 2360.448019]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
> [ 2360.448019]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
> [ 2360.448019] Code: 8b 76 40 48 89 d7 48 89 55 a0 e8 2b 74 ff ff 83 f8 17 0f 87 1e ff ff ff 0f 0b 80 fa b2 0f 84 b4 f8 ff ff 0f 0b 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57
> [ 2360.448019] RIP  [<ffffffffa00f62a1>] run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.448019]  RSP <ffff88003eca1a68>
> [ 2360.814508] ---[ end trace 555a16cac3620ccb ]---
> [ 2360.820398] note: btrfs[8212] exited with preempt_count 1
> [ 2360.827072] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
> [ 2360.836047] in_atomic(): 1, irqs_disabled(): 0, pid: 8212, name: btrfs
> [ 2360.843859] INFO: lockdep is turned off.
> [ 2360.849021] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ #170
> [ 2360.849022] Call Trace:
> [ 2360.849027]  [<ffffffff8107a40c>] __might_sleep+0xfc/0x130
> [ 2360.849030]  [<ffffffff818ea0f6>] down_read+0x26/0xa0
> [ 2360.849034]  [<ffffffff810b416b>] acct_collect+0x4b/0x1b0
> [ 2360.849038]  [<ffffffff8104c838>] do_exit+0x718/0x9a0
> [ 2360.849041]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
> [ 2360.849043]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
> [ 2360.849046]  [<ffffffff81005a7b>] die+0x5b/0x90
> [ 2360.849048]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
> [ 2360.849052]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
> [ 2360.849067]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.849071]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [ 2360.849073]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
> [ 2360.849076]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
> [ 2360.849087]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.849097]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
> [ 2360.849108]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
> [ 2360.849110]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
> [ 2360.849119]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
> [ 2360.849133]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
> [ 2360.849145]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
> [ 2360.849148]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
> [ 2360.849161]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
> [ 2360.849164]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
> [ 2360.849166]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
> [ 2360.849180]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
> [ 2360.849194]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
> [ 2360.849207]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
> [ 2360.849221]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
> [ 2360.849224]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
> [ 2360.849226]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
> [ 2360.849229]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
> [ 2360.849231]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
> [ 2360.849234]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
> [ 2360.849236]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
> [ 2360.849239]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
> [ 2360.849255] BUG: scheduling while atomic: btrfs/8212/0x10000002
> [ 2360.849256] INFO: lockdep is turned off.
> [ 2360.849257] Modules linked in: btrfs aoe [last unloaded: btrfs]
> [ 2360.849261] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ #170
> [ 2360.849262] Call Trace:
> [ 2360.849262]  [<ffffffff81078318>] __schedule_bug+0x68/0x90
> [ 2360.849265]  [<ffffffff818eafcc>] __schedule+0x73c/0x810
> [ 2360.849268]  [<ffffffff8107b48a>] __cond_resched+0x2a/0x40
> [ 2360.849270]  [<ffffffff818eb121>] _cond_resched+0x31/0x40
> [ 2360.849273]  [<ffffffff81128e13>] unmap_single_vma+0x493/0x750
> [ 2360.849276]  [<ffffffff811100b0>] ? lru_deactivate_fn+0x1e0/0x1e0
> [ 2360.849279]  [<ffffffff810a4be0>] ? trace_hardirqs_on_caller+0x20/0x1d0
> [ 2360.849281]  [<ffffffff8112986c>] unmap_vmas+0x3c/0x60
> [ 2360.849284]  [<ffffffff81130de1>] exit_mmap+0x81/0x140
> [ 2360.849287]  [<ffffffff81043824>] mmput+0x74/0x130
> [ 2360.849289]  [<ffffffff8104a520>] exit_mm+0x100/0x120
> [ 2360.849292]  [<ffffffff8104c858>] do_exit+0x738/0x9a0
> [ 2360.849294]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
> [ 2360.849297]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
> [ 2360.849299]  [<ffffffff81005a7b>] die+0x5b/0x90
> [ 2360.849301]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
> [ 2360.849304]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
> [ 2360.849307]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.849317]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [ 2360.849320]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
> [ 2360.849322]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
> [ 2360.849325]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.849335]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
> [ 2360.849346]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
> [ 2360.849356]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
> [ 2360.849358]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
> [ 2360.849367]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
> [ 2360.849380]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
> [ 2360.849393]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
> [ 2360.849395]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
> [ 2360.849409]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
> [ 2360.849411]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
> [ 2360.849413]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
> [ 2360.849427]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
> [ 2360.849441]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
> [ 2360.849455]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
> [ 2360.849469]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
> [ 2360.849471]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
> [ 2360.849473]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
> [ 2360.849476]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
> [ 2360.849478]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
> [ 2360.849481]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
> [ 2360.849483]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
> 
> fs/btrfs/extent-tree.c:6047
> 
> 6046         if (parent > 0) {
> 6047                 BUG_ON(!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF));
> 6048                 btrfs_set_extent_inline_ref_type(leaf, iref,
> 6049                                                  BTRFS_SHARED_BLOCK_REF_KEY);
> 6050                 btrfs_set_extent_inline_ref_offset(leaf, iref, parent);
> 6051         } else {
> 6052                 btrfs_set_extent_inline_ref_type(leaf, iref,
> 6053                                                  BTRFS_TREE_BLOCK_REF_KEY);
> 6054                 btrfs_set_extent_inline_ref_offset(leaf, iref, root_objectid);
> 6055         }

This bug is similar to the one which is reported by Daniel J Blueman a month ago. And
Josef have fixed it, but the patch has not been merged into for-linus branch till now.
Did you applied that patch?

> 
> Currently for-linux hangs early during the test, so I applied V3 patches on top
> of 3.5.
> 
> The filesystem is freshly created, the load is to simultaneously unpack large tar,
> snapshot the fs, delete random snapshot, looped rm of the untarred dir. Crashes after
> some minutes, reliably.

Could you send the test tool to me? I want to look into it.

Thanks
Miao

> 
> Fsck spits lots of errors:
> 
> ref mismatch on [1133031424 4096] extent item 1, found 0
> Backref 1133031424 root 5 not referenced back 0x7d1f40
> Incorrect global backref count on 1133031424 found 1 wanted 0
> backpointer mismatch on [1133031424 4096]
> owner ref check failed [1133031424 4096]
> 
> ref mismatch on [11213131776 16384] extent item 1, found 0
> Incorrect local backref count on 11213131776 root 5 owner 34509 offset 0 found 0 wanted 1 back 0x1424d8e0
> backpointer mismatch on [11213131776 16384]
> owner ref check failed [11213131776 16384]
> 
> fs tree 260 refs 6 not found
>         unresolved ref root 263 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 267 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 269 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 273 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 274 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 276 dir 256 index 4 namelen 14 name snap2748615355 error 600
> 
> 
> I've asked Josef to pull those patches out of btrfs-next, feel free to send me any testing
> version if you can't reproduce it on your side.
> 
> 
> david
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Aug. 3, 2012, 9:03 p.m. UTC | #3
On Thu, Aug 02, 2012 at 07:53:36PM -0600, Miao Xie wrote:
> On Thu, 2 Aug 2012 13:46:31 +0200, David Sterba wrote:
> > Hi,
> >
> > appologies for late reply,
> >
> > On Thu, Aug 02, 2012 at 12:40:46PM +0800, Miao Xie wrote:
> >> Changelog v1 -> v2:
> >> - add comment to explain why we need deal with the delayed items after
> >>   snapshot creation and why this operation do not corrupt the metadata.
> >
> > I'm sorry, the comment did not fix the bug :)
> >
> > The subvol stress is able to hit this:
> >
> > [ 2360.444321] ------------[ cut here ]------------
> > [ 2360.448019] kernel BUG at fs/btrfs/extent-tree.c:6047!
> > [ 2360.448019] invalid opcode: 0000 [#1] SMP
> > [ 2360.448019] CPU 0
> > [ 2360.448019] Modules linked in: btrfs aoe [last unloaded: btrfs]
> > [ 2360.448019]
> > [ 2360.448019] Pid: 8212, comm: btrfs Not tainted 3.5.0-default+ #170 Intel Corporation Santa Rosa platform/Matanzas
> > [ 2360.448019] RIP: 0010:[<ffffffffa00f62a1>]  [<ffffffffa00f62a1>] run_clustered_refs+0xa11/0xa20 [btrfs]
> > [ 2360.448019] RSP: 0018:ffff88003eca1a68  EFLAGS: 00010246
> > [ 2360.448019] RAX: 00000000000007ff RBX: ffff880017a694c8 RCX: ffff88003eca1a08
> > [ 2360.448019] RDX: ffff880028aa9000 RSI: 00000000000007fe RDI: ffff880064223cf0
> > [ 2360.448019] RBP: ffff88003eca1b48 R08: 00000000000007ff R09: ffff88003eca19f8
> > [ 2360.448019] R10: ffff88002435d1e8 R11: 0000000000000000 R12: ffff880025d66d28
> > [ 2360.448019] R13: ffff880038640000 R14: ffff8800778dfa88 R15: ffff880060f010d0
> > [ 2360.448019] FS:  00007f3289f35740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
> > [ 2360.448019] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 2360.448019] CR2: ffffffffff600400 CR3: 000000002e112000 CR4: 00000000000007f0
> > [ 2360.448019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 2360.448019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [ 2360.448019] Process btrfs (pid: 8212, threadinfo ffff88003eca0000, task ffff88001d834200)
> > [ 2360.448019] Stack:
> > [ 2360.448019]  0000000000000000 0000000000000000 0000000000000001 0000000000000000
> > [ 2360.448019]  00000000000007ed ffff88002435d1e8 000000003eca1b18 0000000000000000
> > [ 2360.448019]  0000000000000770 0000000000000000 000000005cb1e000 ffff88003eca1c08
> > [ 2360.448019] Call Trace:
> > [ 2360.448019]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
> > [ 2360.448019]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
> > [ 2360.448019]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
> > [ 2360.448019]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
> > [ 2360.448019]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
> > [ 2360.448019]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
> > [ 2360.448019]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
> > [ 2360.448019]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
> > [ 2360.448019]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
> > [ 2360.448019]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
> > [ 2360.448019]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
> > [ 2360.448019]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
> > [ 2360.448019]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
> > [ 2360.448019]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
> > [ 2360.448019]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
> > [ 2360.448019]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
> > [ 2360.448019]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
> > [ 2360.448019]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
> > [ 2360.448019]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
> > [ 2360.448019]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
> > [ 2360.448019] Code: 8b 76 40 48 89 d7 48 89 55 a0 e8 2b 74 ff ff 83 f8 17 0f 87 1e ff ff ff 0f 0b 80 fa b2 0f 84 b4 f8 ff ff 0f 0b 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57
> > [ 2360.448019] RIP  [<ffffffffa00f62a1>] run_clustered_refs+0xa11/0xa20 [btrfs]
> > [ 2360.448019]  RSP <ffff88003eca1a68>
> > [ 2360.814508] ---[ end trace 555a16cac3620ccb ]---
> > [ 2360.820398] note: btrfs[8212] exited with preempt_count 1
> > [ 2360.827072] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
> > [ 2360.836047] in_atomic(): 1, irqs_disabled(): 0, pid: 8212, name: btrfs
> > [ 2360.843859] INFO: lockdep is turned off.
> > [ 2360.849021] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ #170
> > [ 2360.849022] Call Trace:
> > [ 2360.849027]  [<ffffffff8107a40c>] __might_sleep+0xfc/0x130
> > [ 2360.849030]  [<ffffffff818ea0f6>] down_read+0x26/0xa0
> > [ 2360.849034]  [<ffffffff810b416b>] acct_collect+0x4b/0x1b0
> > [ 2360.849038]  [<ffffffff8104c838>] do_exit+0x718/0x9a0
> > [ 2360.849041]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
> > [ 2360.849043]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
> > [ 2360.849046]  [<ffffffff81005a7b>] die+0x5b/0x90
> > [ 2360.849048]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
> > [ 2360.849052]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
> > [ 2360.849067]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> > [ 2360.849071]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> > [ 2360.849073]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
> > [ 2360.849076]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
> > [ 2360.849087]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> > [ 2360.849097]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
> > [ 2360.849108]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
> > [ 2360.849110]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
> > [ 2360.849119]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
> > [ 2360.849133]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
> > [ 2360.849145]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
> > [ 2360.849148]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
> > [ 2360.849161]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
> > [ 2360.849164]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
> > [ 2360.849166]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
> > [ 2360.849180]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
> > [ 2360.849194]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
> > [ 2360.849207]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
> > [ 2360.849221]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
> > [ 2360.849224]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
> > [ 2360.849226]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
> > [ 2360.849229]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
> > [ 2360.849231]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
> > [ 2360.849234]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
> > [ 2360.849236]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
> > [ 2360.849239]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
> > [ 2360.849255] BUG: scheduling while atomic: btrfs/8212/0x10000002
> > [ 2360.849256] INFO: lockdep is turned off.
> > [ 2360.849257] Modules linked in: btrfs aoe [last unloaded: btrfs]
> > [ 2360.849261] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ #170
> > [ 2360.849262] Call Trace:
> > [ 2360.849262]  [<ffffffff81078318>] __schedule_bug+0x68/0x90
> > [ 2360.849265]  [<ffffffff818eafcc>] __schedule+0x73c/0x810
> > [ 2360.849268]  [<ffffffff8107b48a>] __cond_resched+0x2a/0x40
> > [ 2360.849270]  [<ffffffff818eb121>] _cond_resched+0x31/0x40
> > [ 2360.849273]  [<ffffffff81128e13>] unmap_single_vma+0x493/0x750
> > [ 2360.849276]  [<ffffffff811100b0>] ? lru_deactivate_fn+0x1e0/0x1e0
> > [ 2360.849279]  [<ffffffff810a4be0>] ? trace_hardirqs_on_caller+0x20/0x1d0
> > [ 2360.849281]  [<ffffffff8112986c>] unmap_vmas+0x3c/0x60
> > [ 2360.849284]  [<ffffffff81130de1>] exit_mmap+0x81/0x140
> > [ 2360.849287]  [<ffffffff81043824>] mmput+0x74/0x130
> > [ 2360.849289]  [<ffffffff8104a520>] exit_mm+0x100/0x120
> > [ 2360.849292]  [<ffffffff8104c858>] do_exit+0x738/0x9a0
> > [ 2360.849294]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
> > [ 2360.849297]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
> > [ 2360.849299]  [<ffffffff81005a7b>] die+0x5b/0x90
> > [ 2360.849301]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
> > [ 2360.849304]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
> > [ 2360.849307]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> > [ 2360.849317]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> > [ 2360.849320]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
> > [ 2360.849322]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
> > [ 2360.849325]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> > [ 2360.849335]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
> > [ 2360.849346]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
> > [ 2360.849356]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
> > [ 2360.849358]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
> > [ 2360.849367]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
> > [ 2360.849380]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
> > [ 2360.849393]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
> > [ 2360.849395]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
> > [ 2360.849409]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
> > [ 2360.849411]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
> > [ 2360.849413]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
> > [ 2360.849427]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
> > [ 2360.849441]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
> > [ 2360.849455]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
> > [ 2360.849469]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
> > [ 2360.849471]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
> > [ 2360.849473]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
> > [ 2360.849476]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
> > [ 2360.849478]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
> > [ 2360.849481]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
> > [ 2360.849483]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
> >
> > fs/btrfs/extent-tree.c:6047
> >
> > 6046         if (parent > 0) {
> > 6047                 BUG_ON(!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF));
> > 6048                 btrfs_set_extent_inline_ref_type(leaf, iref,
> > 6049                                                  BTRFS_SHARED_BLOCK_REF_KEY);
> > 6050                 btrfs_set_extent_inline_ref_offset(leaf, iref, parent);
> > 6051         } else {
> > 6052                 btrfs_set_extent_inline_ref_type(leaf, iref,
> > 6053                                                  BTRFS_TREE_BLOCK_REF_KEY);
> > 6054                 btrfs_set_extent_inline_ref_offset(leaf, iref, root_objectid);
> > 6055         }
> 
> This bug is similar to the one which is reported by Daniel J Blueman a month ago. And
> Josef have fixed it, but the patch has not been merged into for-linus branch till now.
> Did you applied that patch?
> 

What patch is this?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miao Xie Aug. 4, 2012, 5:53 a.m. UTC | #4
On fri, 3 Aug 2012 17:03:01 -0400, Josef Bacik wrote:
> On Thu, Aug 02, 2012 at 07:53:36PM -0600, Miao Xie wrote:
>> On Thu, 2 Aug 2012 13:46:31 +0200, David Sterba wrote:
>>> Hi,
>>>
>>> appologies for late reply,
>>>
>>> On Thu, Aug 02, 2012 at 12:40:46PM +0800, Miao Xie wrote:
>>>> Changelog v1 -> v2:
>>>> - add comment to explain why we need deal with the delayed items after
>>>>   snapshot creation and why this operation do not corrupt the metadata.
>>>
>>> I'm sorry, the comment did not fix the bug :)
>>>
>>> The subvol stress is able to hit this:
>>>
>>> [ 2360.444321] ------------[ cut here ]------------
>>> [ 2360.448019] kernel BUG at fs/btrfs/extent-tree.c:6047!
>>> [ 2360.448019] invalid opcode: 0000 [#1] SMP
>>> [ 2360.448019] CPU 0
>>> [ 2360.448019] Modules linked in: btrfs aoe [last unloaded: btrfs]
>>> [ 2360.448019]
>>> [ 2360.448019] Pid: 8212, comm: btrfs Not tainted 3.5.0-default+ #170 Intel Corporation Santa Rosa platform/Matanzas
>>> [ 2360.448019] RIP: 0010:[<ffffffffa00f62a1>]  [<ffffffffa00f62a1>] run_clustered_refs+0xa11/0xa20 [btrfs]
>>> [ 2360.448019] RSP: 0018:ffff88003eca1a68  EFLAGS: 00010246
>>> [ 2360.448019] RAX: 00000000000007ff RBX: ffff880017a694c8 RCX: ffff88003eca1a08
>>> [ 2360.448019] RDX: ffff880028aa9000 RSI: 00000000000007fe RDI: ffff880064223cf0
>>> [ 2360.448019] RBP: ffff88003eca1b48 R08: 00000000000007ff R09: ffff88003eca19f8
>>> [ 2360.448019] R10: ffff88002435d1e8 R11: 0000000000000000 R12: ffff880025d66d28
>>> [ 2360.448019] R13: ffff880038640000 R14: ffff8800778dfa88 R15: ffff880060f010d0
>>> [ 2360.448019] FS:  00007f3289f35740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
>>> [ 2360.448019] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [ 2360.448019] CR2: ffffffffff600400 CR3: 000000002e112000 CR4: 00000000000007f0
>>> [ 2360.448019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ 2360.448019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> [ 2360.448019] Process btrfs (pid: 8212, threadinfo ffff88003eca0000, task ffff88001d834200)
>>> [ 2360.448019] Stack:
>>> [ 2360.448019]  0000000000000000 0000000000000000 0000000000000001 0000000000000000
>>> [ 2360.448019]  00000000000007ed ffff88002435d1e8 000000003eca1b18 0000000000000000
>>> [ 2360.448019]  0000000000000770 0000000000000000 000000005cb1e000 ffff88003eca1c08
>>> [ 2360.448019] Call Trace:
>>> [ 2360.448019]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
>>> [ 2360.448019]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
>>> [ 2360.448019]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
>>> [ 2360.448019]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
>>> [ 2360.448019]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
>>> [ 2360.448019]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
>>> [ 2360.448019]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
>>> [ 2360.448019]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
>>> [ 2360.448019]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
>>> [ 2360.448019]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
>>> [ 2360.448019]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
>>> [ 2360.448019]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
>>> [ 2360.448019]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
>>> [ 2360.448019]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
>>> [ 2360.448019]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
>>> [ 2360.448019]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
>>> [ 2360.448019]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
>>> [ 2360.448019]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
>>> [ 2360.448019]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
>>> [ 2360.448019]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
>>> [ 2360.448019] Code: 8b 76 40 48 89 d7 48 89 55 a0 e8 2b 74 ff ff 83 f8 17 0f 87 1e ff ff ff 0f 0b 80 fa b2 0f 84 b4 f8 ff ff 0f 0b 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57
>>> [ 2360.448019] RIP  [<ffffffffa00f62a1>] run_clustered_refs+0xa11/0xa20 [btrfs]
>>> [ 2360.448019]  RSP <ffff88003eca1a68>
>>> [ 2360.814508] ---[ end trace 555a16cac3620ccb ]---
>>> [ 2360.820398] note: btrfs[8212] exited with preempt_count 1
>>> [ 2360.827072] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
>>> [ 2360.836047] in_atomic(): 1, irqs_disabled(): 0, pid: 8212, name: btrfs
>>> [ 2360.843859] INFO: lockdep is turned off.
>>> [ 2360.849021] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ #170
>>> [ 2360.849022] Call Trace:
>>> [ 2360.849027]  [<ffffffff8107a40c>] __might_sleep+0xfc/0x130
>>> [ 2360.849030]  [<ffffffff818ea0f6>] down_read+0x26/0xa0
>>> [ 2360.849034]  [<ffffffff810b416b>] acct_collect+0x4b/0x1b0
>>> [ 2360.849038]  [<ffffffff8104c838>] do_exit+0x718/0x9a0
>>> [ 2360.849041]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
>>> [ 2360.849043]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
>>> [ 2360.849046]  [<ffffffff81005a7b>] die+0x5b/0x90
>>> [ 2360.849048]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
>>> [ 2360.849052]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
>>> [ 2360.849067]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
>>> [ 2360.849071]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>>> [ 2360.849073]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
>>> [ 2360.849076]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
>>> [ 2360.849087]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
>>> [ 2360.849097]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
>>> [ 2360.849108]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
>>> [ 2360.849110]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
>>> [ 2360.849119]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
>>> [ 2360.849133]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
>>> [ 2360.849145]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
>>> [ 2360.849148]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
>>> [ 2360.849161]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
>>> [ 2360.849164]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
>>> [ 2360.849166]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
>>> [ 2360.849180]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
>>> [ 2360.849194]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
>>> [ 2360.849207]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
>>> [ 2360.849221]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
>>> [ 2360.849224]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
>>> [ 2360.849226]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
>>> [ 2360.849229]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
>>> [ 2360.849231]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
>>> [ 2360.849234]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
>>> [ 2360.849236]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
>>> [ 2360.849239]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
>>> [ 2360.849255] BUG: scheduling while atomic: btrfs/8212/0x10000002
>>> [ 2360.849256] INFO: lockdep is turned off.
>>> [ 2360.849257] Modules linked in: btrfs aoe [last unloaded: btrfs]
>>> [ 2360.849261] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ #170
>>> [ 2360.849262] Call Trace:
>>> [ 2360.849262]  [<ffffffff81078318>] __schedule_bug+0x68/0x90
>>> [ 2360.849265]  [<ffffffff818eafcc>] __schedule+0x73c/0x810
>>> [ 2360.849268]  [<ffffffff8107b48a>] __cond_resched+0x2a/0x40
>>> [ 2360.849270]  [<ffffffff818eb121>] _cond_resched+0x31/0x40
>>> [ 2360.849273]  [<ffffffff81128e13>] unmap_single_vma+0x493/0x750
>>> [ 2360.849276]  [<ffffffff811100b0>] ? lru_deactivate_fn+0x1e0/0x1e0
>>> [ 2360.849279]  [<ffffffff810a4be0>] ? trace_hardirqs_on_caller+0x20/0x1d0
>>> [ 2360.849281]  [<ffffffff8112986c>] unmap_vmas+0x3c/0x60
>>> [ 2360.849284]  [<ffffffff81130de1>] exit_mmap+0x81/0x140
>>> [ 2360.849287]  [<ffffffff81043824>] mmput+0x74/0x130
>>> [ 2360.849289]  [<ffffffff8104a520>] exit_mm+0x100/0x120
>>> [ 2360.849292]  [<ffffffff8104c858>] do_exit+0x738/0x9a0
>>> [ 2360.849294]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
>>> [ 2360.849297]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
>>> [ 2360.849299]  [<ffffffff81005a7b>] die+0x5b/0x90
>>> [ 2360.849301]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
>>> [ 2360.849304]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
>>> [ 2360.849307]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
>>> [ 2360.849317]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>>> [ 2360.849320]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
>>> [ 2360.849322]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
>>> [ 2360.849325]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
>>> [ 2360.849335]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
>>> [ 2360.849346]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 [btrfs]
>>> [ 2360.849356]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
>>> [ 2360.849358]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
>>> [ 2360.849367]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs]
>>> [ 2360.849380]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 [btrfs]
>>> [ 2360.849393]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
>>> [ 2360.849395]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
>>> [ 2360.849409]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
>>> [ 2360.849411]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
>>> [ 2360.849413]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
>>> [ 2360.849427]  [<ffffffffa013fe5a>] btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
>>> [ 2360.849441]  [<ffffffffa014005d>] btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
>>> [ 2360.849455]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
>>> [ 2360.849469]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
>>> [ 2360.849471]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
>>> [ 2360.849473]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
>>> [ 2360.849476]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
>>> [ 2360.849478]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
>>> [ 2360.849481]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
>>> [ 2360.849483]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
>>>
>>> fs/btrfs/extent-tree.c:6047
>>>
>>> 6046         if (parent > 0) {
>>> 6047                 BUG_ON(!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF));
>>> 6048                 btrfs_set_extent_inline_ref_type(leaf, iref,
>>> 6049                                                  BTRFS_SHARED_BLOCK_REF_KEY);
>>> 6050                 btrfs_set_extent_inline_ref_offset(leaf, iref, parent);
>>> 6051         } else {
>>> 6052                 btrfs_set_extent_inline_ref_type(leaf, iref,
>>> 6053                                                  BTRFS_TREE_BLOCK_REF_KEY);
>>> 6054                 btrfs_set_extent_inline_ref_offset(leaf, iref, root_objectid);
>>> 6055         }
>>
>> This bug is similar to the one which is reported by Daniel J Blueman a month ago. And
>> Josef have fixed it, but the patch has not been merged into for-linus branch till now.
>> Did you applied that patch?
>>
> 
> What patch is this?  Thanks,

http://marc.info/?l=linux-btrfs&m=134227134622271&w=2

URL of the bug report is:
http://marc.info/?l=linux-btrfs&m=134120032905388&w=2

But I'm not sure these two bugs is the same, so I need the test tool of David to look into it.

Thanks
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Aug. 8, 2012, 1:38 p.m. UTC | #5
On Sat, Aug 04, 2012 at 01:53:28PM +0800, Miao Xie wrote:
> But I'm not sure these two bugs is the same, so I need the test tool
> of David to look into it.

Attached. It's a set of scripts and has a few assumptions hardcoded,
like where the tar srouce is and the name of extracted directory, so
it'll need a few tweaks. Also the actions are started inside a tmux
session for convenience.

The expected stress load is to generate lots of files and directories,
snapshot create and delete and rm of the untarred directory. The 'rm'
step is delayed a few minutes so the tar generates enough data. On some
hosts the rm phase is fast and will clean the untar directory too
quickly.


david
Miao Xie Aug. 9, 2012, 3:28 a.m. UTC | #6
On Wed, 8 Aug 2012 15:38:41 +0200, David Sterba wrote:
> On Sat, Aug 04, 2012 at 01:53:28PM +0800, Miao Xie wrote:
>> But I'm not sure these two bugs is the same, so I need the test tool
>> of David to look into it.
> 
> Attached. It's a set of scripts and has a few assumptions hardcoded,
> like where the tar srouce is and the name of extracted directory, so
> it'll need a few tweaks. Also the actions are started inside a tmux
> session for convenience.
> 
> The expected stress load is to generate lots of files and directories,
> snapshot create and delete and rm of the untarred directory. The 'rm'
> step is delayed a few minutes so the tar generates enough data. On some
> hosts the rm phase is fast and will clean the untar directory too
> quickly.

Thanks.

I find this bug you reported is not introduced by my patch, it is a old bug.
I have sent a RFC patch, I'll be very appreciated if you can get any comment.

Regards
Miao
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mitch Harder Sept. 18, 2012, 7:24 p.m. UTC | #7
On Thu, Aug 2, 2012 at 6:46 AM, David Sterba <dave@jikos.cz> wrote:
...
>
> Fsck spits lots of errors:
>
> ref mismatch on [1133031424 4096] extent item 1, found 0
> Backref 1133031424 root 5 not referenced back 0x7d1f40
> Incorrect global backref count on 1133031424 found 1 wanted 0
> backpointer mismatch on [1133031424 4096]
> owner ref check failed [1133031424 4096]
>
> ref mismatch on [11213131776 16384] extent item 1, found 0
> Incorrect local backref count on 11213131776 root 5 owner 34509 offset 0 found 0 wanted 1 back 0x1424d8e0
> backpointer mismatch on [11213131776 16384]
> owner ref check failed [11213131776 16384]
>
> fs tree 260 refs 6 not found
>         unresolved ref root 263 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 267 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 269 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 273 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 274 dir 256 index 4 namelen 14 name snap2748615355 error 600
>         unresolved ref root 276 dir 256 index 4 namelen 14 name snap2748615355 error 600
>
>
> I've asked Josef to pull those patches out of btrfs-next, feel free to send me any testing
> version if you can't reproduce it on your side.
>

I've run into similar errors after an unclean shutdown on a partition
where I make use of several subvolumes.

Some of the data in the subvolume is inaccessible, although the
original root volume seems OK.

So far, the partition is resisting my efforts to fix the errors.

This unclean shutdown occurred while using a 3.5.3 kernel merged with
the for-linus branch, so it did not contain any of Miao Xie's recent
patches to address this issue.

I've made an image of the corrupted volume if anybody has something
they'd like me to test.  But I'm primarily reporting this to let you
know I'm seeing errors similar to the one's thrown off by your test
case.

I'm going to look into merging the patches from Josef's btrfs-next to
see if the problem recurs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 4e9c106..7943dc2 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -950,6 +950,8 @@  static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	struct btrfs_root *parent_root;
 	struct btrfs_block_rsv *rsv;
 	struct inode *parent_inode;
+	struct btrfs_path *path;
+	struct btrfs_dir_item *dir_item;
 	struct dentry *parent;
 	struct dentry *dentry;
 	struct extent_buffer *tmp;
@@ -962,6 +964,12 @@  static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	u64 root_flags;
 	uuid_le new_uuid;
 
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = pending->error = -ENOMEM;
+		goto path_alloc_fail;
+	}
+
 	new_root_item = kmalloc(sizeof(*new_root_item), GFP_NOFS);
 	if (!new_root_item) {
 		ret = pending->error = -ENOMEM;
@@ -1010,22 +1018,20 @@  static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	 */
 	ret = btrfs_set_inode_index(parent_inode, &index);
 	BUG_ON(ret); /* -ENOMEM */
-	ret = btrfs_insert_dir_item(trans, parent_root,
-				dentry->d_name.name, dentry->d_name.len,
-				parent_inode, &key,
-				BTRFS_FT_DIR, index);
-	if (ret == -EEXIST) {
+
+	/* check if there is a file/dir which has the same name. */
+	dir_item = btrfs_lookup_dir_item(NULL, parent_root, path,
+					 btrfs_ino(parent_inode),
+					 dentry->d_name.name,
+					 dentry->d_name.len, 0);
+	if (dir_item != NULL && !IS_ERR(dir_item)) {
 		pending->error = -EEXIST;
 		goto fail;
-	} else if (ret) {
+	} else if (IS_ERR(dir_item)) {
+		ret = PTR_ERR(dir_item);
 		goto abort_trans;
 	}
-
-	btrfs_i_size_write(parent_inode, parent_inode->i_size +
-					 dentry->d_name.len * 2);
-	ret = btrfs_update_inode(trans, parent_root, parent_inode);
-	if (ret)
-		goto abort_trans;
+	btrfs_release_path(path);
 
 	/*
 	 * pull in the delayed directory update
@@ -1113,12 +1119,29 @@  static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	ret = btrfs_reloc_post_snapshot(trans, pending);
 	if (ret)
 		goto abort_trans;
+
+	ret = btrfs_insert_dir_item(trans, parent_root,
+				    dentry->d_name.name, dentry->d_name.len,
+				    parent_inode, &key,
+				    BTRFS_FT_DIR, index);
+	/* We have check then name at the beginning, so it is impossible. */
+	BUG_ON(ret == -EEXIST);
+	if (ret)
+		goto abort_trans;
+
+	btrfs_i_size_write(parent_inode, parent_inode->i_size +
+					 dentry->d_name.len * 2);
+	ret = btrfs_update_inode(trans, parent_root, parent_inode);
+	if (ret)
+		goto abort_trans;
 fail:
 	dput(parent);
 	trans->block_rsv = rsv;
 no_free_objectid:
 	kfree(new_root_item);
 root_item_alloc_fail:
+	btrfs_free_path(path);
+path_alloc_fail:
 	btrfs_block_rsv_release(root, &pending->block_rsv, (u64)-1);
 	return ret;
 
@@ -1444,13 +1467,28 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 	 */
 	mutex_lock(&root->fs_info->reloc_mutex);
 
-	ret = btrfs_run_delayed_items(trans, root);
+	/*
+	 * We needn't worry about the delayed items because we will
+	 * deal with them in create_pending_snapshot(), which is the
+	 * core function of the snapshot creation.
+	 */
+	ret = create_pending_snapshots(trans, root->fs_info);
 	if (ret) {
 		mutex_unlock(&root->fs_info->reloc_mutex);
 		goto cleanup_transaction;
 	}
 
-	ret = create_pending_snapshots(trans, root->fs_info);
+	/*
+	 * We insert the dir indexes of the snapshots and update the inode
+	 * of the snapshots' parents after the snapshot creation, so there
+	 * are some delayed items which are not dealt with. Now deal with
+	 * them.
+	 *
+	 * We needn't worry that this operation will corrupt the snapshots,
+	 * because all the tree which are snapshoted will be forced to COW
+	 * the nodes and leaves.
+	 */
+	ret = btrfs_run_delayed_items(trans, root);
 	if (ret) {
 		mutex_unlock(&root->fs_info->reloc_mutex);
 		goto cleanup_transaction;