Message ID | 20110817185619.4660.78543.stgit@localhost6 (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 17 Aug 2011 22:56:19 +0400 Konstantin Khlebnikov <khlebnikov@openvz.org> wrote: > iput() shouldn't be called for inodes in I_NEW state, > lets call __destroy_inode() and btrfs_destroy_inode() instead Hi Konstantin! Running 'sync' program after the load does not finish and eats 100%CPU busy-waiting for something in kernel. It's easy to reproduce hang with patch for me. I just run liferea and sync after it. Without patch I haven't managed to hang btrfs up.
> Running 'sync' program after the load does not finish and eats 100%CPU > busy-waiting for something in kernel. > > It's easy to reproduce hang with patch for me. I just run liferea and sync > after it. Without patch I haven't managed to hang btrfs up. And I think it's another btrfs bug. I've managed to reproduce it _without_ your patch and _without_ autodefrag enabled by manually running the following commands: $ btrfs fi defrag file-with-20_000-extents $ sync I think your patch just shuffles things a bit and forces autodefrag races to pop-up sooner (which is good! :])
On 08/30/2011 12:53 PM, Sergei Trofimovich wrote: >> Running 'sync' program after the load does not finish and eats >> 100%CPU busy-waiting for something in kernel. >> >> It's easy to reproduce hang with patch for me. I just run >> liferea and sync after it. Without patch I haven't managed to >> hang btrfs up. > > And I think it's another btrfs bug. I've managed to reproduce it > _without_ your patch and _without_ autodefrag enabled by manually > running the following commands: $ btrfs fi defrag > file-with-20_000-extents $ sync > > I think your patch just shuffles things a bit and forces > autodefrag races to pop-up sooner (which is good! :]) > Sergei, can you do sysrq+w when this is happening, and maybe turn on the softlockup detector so we can see where sync is getting stuck? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 30 Aug 2011 14:02:37 -0400 Josef Bacik <josef@redhat.com> wrote: > On 08/30/2011 12:53 PM, Sergei Trofimovich wrote: > >> Running 'sync' program after the load does not finish and eats > >> 100%CPU busy-waiting for something in kernel. > >> > >> It's easy to reproduce hang with patch for me. I just run > >> liferea and sync after it. Without patch I haven't managed to > >> hang btrfs up. > > > > And I think it's another btrfs bug. I've managed to reproduce it > > _without_ your patch and _without_ autodefrag enabled by manually > > running the following commands: $ btrfs fi defrag > > file-with-20_000-extents $ sync > > > > I think your patch just shuffles things a bit and forces > > autodefrag races to pop-up sooner (which is good! :]) > > > > Sergei, can you do sysrq+w when this is happening, and maybe turn on > the softlockup detector so we can see where sync is getting stuck? > Thanks, Sure. As I keep telling about 2 cases in IRC I will state both here explicitely: ==The First Issue (aka "The Hung sync()" case) == - it's an unpatched linus's v3.1-rc4-80-g0f43dd5 - /dev/root on / type btrfs (rw,noatime,compress=lzo) - 50% full 30GB filesystem (usual nonmixed mode) How I hung it: $ /usr/sbin/filefrag ~/.bogofilter/wordlist.db /home/st/.bogofilter/wordlist.db: 19070 extents found the file is 138MB sqlite database for bayesian SPAM filter, it's being read and written every 20 minutes or so. Maybe, it was writtent even in defrag/sync time! $~/dev/git/btrfs-progs-unstable/btrfs fi defrag ~/.bogofilter/wordlist.db $ sync ^C<hung in D-state> I didn't try to reproduce it yet. As for lockdep I'll try but I'm afraid I will fail to reproduce, but I'll try tomorrow. I suspect I'll need to seriously fragment some file first down to such horrible state. With help of David I've some (hopefully relevant) info: #!/bin/sh -x for i in $(ps aux|grep " D[+ ]\?"|awk '{print $2}'); do ps $i sudo cat /proc/$i/stack done PID TTY STAT TIME COMMAND 1291 ? D 0:00 [btrfs-endio-wri] [<ffffffff8130055d>] btrfs_tree_read_lock+0x6d/0x120 [<ffffffff812b8e88>] btrfs_search_slot+0x698/0x8b0 [<ffffffff812c9e18>] btrfs_lookup_csum+0x68/0x190 [<ffffffff812ca10f>] __btrfs_lookup_bio_sums+0x1cf/0x3e0 [<ffffffff812ca371>] btrfs_lookup_bio_sums+0x11/0x20 [<ffffffff812d6a50>] btrfs_submit_bio_hook+0x140/0x170 [<ffffffff812ed594>] submit_one_bio+0x64/0xa0 [<ffffffff812f14f5>] extent_readpages+0xe5/0x100 [<ffffffff812d7aaa>] btrfs_readpages+0x1a/0x20 [<ffffffff810a6a02>] __do_page_cache_readahead+0x1d2/0x280 [<ffffffff810a6d8c>] ra_submit+0x1c/0x20 [<ffffffff810a6ebd>] ondemand_readahead+0x12d/0x270 [<ffffffff810a70cc>] page_cache_sync_readahead+0x2c/0x40 [<ffffffff81309987>] __load_free_space_cache+0x1a7/0x5b0 [<ffffffff81309e61>] load_free_space_cache+0xd1/0x190 [<ffffffff812be07b>] cache_block_group+0xab/0x290 [<ffffffff812c3def>] find_free_extent.clone.71+0x39f/0xab0 [<ffffffff812c5160>] btrfs_reserve_extent+0xe0/0x170 [<ffffffff812c56df>] btrfs_alloc_free_block+0xcf/0x330 [<ffffffff812b498d>] __btrfs_cow_block+0x11d/0x4a0 [<ffffffff812b4df8>] btrfs_cow_block+0xe8/0x1a0 [<ffffffff812b8965>] btrfs_search_slot+0x175/0x8b0 [<ffffffff812c9e18>] btrfs_lookup_csum+0x68/0x190 [<ffffffff812caf6e>] btrfs_csum_file_blocks+0xbe/0x670 [<ffffffff812d7d91>] add_pending_csums.clone.39+0x41/0x60 [<ffffffff812da528>] btrfs_finish_ordered_io+0x218/0x310 [<ffffffff812da635>] btrfs_writepage_end_io_hook+0x15/0x20 [<ffffffff8130c71a>] end_compressed_bio_write+0x7a/0xe0 [<ffffffff811146f8>] bio_endio+0x18/0x30 [<ffffffff812cd8fc>] end_workqueue_fn+0xec/0x120 [<ffffffff812fb0ac>] worker_loop+0xac/0x520 [<ffffffff8105d486>] kthread+0x96/0xa0 [<ffffffff815f9214>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff PID TTY STAT TIME COMMAND 1296 ? D 0:00 [btrfs-transacti] [<ffffffff812d3d6e>] btrfs_commit_transaction+0x22e/0x870 [<ffffffff812cd5bb>] transaction_kthread+0x26b/0x280 [<ffffffff8105d486>] kthread+0x96/0xa0 [<ffffffff815f9214>] kernel_thread_helper+0x4/0x10 [<ffffffffffffffff>] 0xffffffffffffffff PID TTY STAT TIME COMMAND 2891 pts/0 D+ 0:00 sync [<ffffffff8109d069>] sleep_on_page+0x9/0x10 [<ffffffff8109d28e>] wait_on_page_bit+0x6e/0x80 [<ffffffff8109d9aa>] filemap_fdatawait_range+0xfa/0x190 [<ffffffff8109da62>] filemap_fdatawait+0x22/0x30 [<ffffffff8110ae5a>] sync_inodes_sb+0x17a/0x1e0 [<ffffffff8110eae8>] __sync_filesystem+0x78/0x80 [<ffffffff8110eb07>] sync_one_sb+0x17/0x20 [<ffffffff810e9cc5>] iterate_supers+0x95/0xf0 [<ffffffff8110ea2b>] sync_filesystems+0x1b/0x20 [<ffffffff8110eb8c>] sys_sync+0x1c/0x40 [<ffffffff815f813b>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff PID TTY STAT TIME COMMAND cat: /proc/4454/stack: ??? ?????? ????? ??? ???????? disassebbled btrfs_tree_read_lock: ffffffff813004f0 <btrfs_tree_read_lock> push %rbp ffffffff813004f1 <btrfs_tree_read_lock+0x1> mov %gs:0xb5c0,%rax ffffffff813004fa <btrfs_tree_read_lock+0xa> mov %rsp,%rbp ffffffff813004fd <btrfs_tree_read_lock+0xd> push %r15 ffffffff813004ff <btrfs_tree_read_lock+0xf> push %r14 ffffffff81300501 <btrfs_tree_read_lock+0x11> lea 0x6c(%rdi),%r14 ffffffff81300505 <btrfs_tree_read_lock+0x15> push %r13 ffffffff81300507 <btrfs_tree_read_lock+0x17> lea 0x70(%rdi),%r13 ffffffff8130050b <btrfs_tree_read_lock+0x1b> push %r12 ffffffff8130050d <btrfs_tree_read_lock+0x1d> lea -0x60(%rbp),%r12 ffffffff81300511 <btrfs_tree_read_lock+0x21> push %rbx ffffffff81300512 <btrfs_tree_read_lock+0x22> lea 0x18(%r12),%r15 ffffffff81300517 <btrfs_tree_read_lock+0x27> lea 0x5c(%rdi),%rbx ffffffff8130051b <btrfs_tree_read_lock+0x2b> sub $0x58,%rsp ffffffff8130051f <btrfs_tree_read_lock+0x2f> mov %rdi,-0x70(%rbp) ffffffff81300523 <btrfs_tree_read_lock+0x33> mov %rax,-0x68(%rbp) ffffffff81300527 <btrfs_tree_read_lock+0x37> mov %rax,-0x78(%rbp) ffffffff8130052b <btrfs_tree_read_lock+0x3b> nopl 0x0(%rax,%rax,1) ffffffff81300530 <btrfs_tree_read_lock+0x40> mov (%rbx),%eax ffffffff81300532 <btrfs_tree_read_lock+0x42> test %eax,%eax ffffffff81300534 <btrfs_tree_read_lock+0x44> je ffffffff8130057e <btrfs_tree_read_lock+0x8e> ffffffff81300536 <btrfs_tree_read_lock+0x46> mov -0x68(%rbp),%rax ffffffff8130053a <btrfs_tree_read_lock+0x4a> movq $0x0,-0x60(%rbp) ffffffff81300542 <btrfs_tree_read_lock+0x52> movq $0xffffffff8105d950,-0x50(%rbp) ffffffff8130054a <btrfs_tree_read_lock+0x5a> mov %r15,-0x48(%rbp) ffffffff8130054e <btrfs_tree_read_lock+0x5e> mov %r15,-0x40(%rbp) ffffffff81300552 <btrfs_tree_read_lock+0x62> mov %rax,-0x58(%rbp) ffffffff81300556 <btrfs_tree_read_lock+0x66> jmp ffffffff8130055d <btrfs_tree_read_lock+0x6d> ffffffff81300558 <btrfs_tree_read_lock+0x68> callq ffffffff815f4670 <schedule> ffffffff8130055d <btrfs_tree_read_lock+0x6d> mov $0x2,%edx ffffffff81300562 <btrfs_tree_read_lock+0x72> mov %r12,%rsi ffffffff81300565 <btrfs_tree_read_lock+0x75> mov %r13,%rdi ffffffff81300568 <btrfs_tree_read_lock+0x78> callq ffffffff8105db90 <prepare_to_wait> ffffffff8130056d <btrfs_tree_read_lock+0x7d> mov (%rbx),%eax ffffffff8130056f <btrfs_tree_read_lock+0x7f> test %eax,%eax ffffffff81300571 <btrfs_tree_read_lock+0x81> jne ffffffff81300558 <btrfs_tree_read_lock+0x68> ffffffff81300573 <btrfs_tree_read_lock+0x83> mov %r12,%rsi ffffffff81300576 <btrfs_tree_read_lock+0x86> mov %r13,%rdi ffffffff81300579 <btrfs_tree_read_lock+0x89> callq ffffffff8105da80 <finish_wait> ffffffff8130057e <btrfs_tree_read_lock+0x8e> mov %r14,%rdi ffffffff81300581 <btrfs_tree_read_lock+0x91> callq ffffffff815f76d0 <_raw_read_lock> ffffffff81300586 <btrfs_tree_read_lock+0x96> mov (%rbx),%eax ffffffff81300588 <btrfs_tree_read_lock+0x98> test %eax,%eax ffffffff8130058a <btrfs_tree_read_lock+0x9a> je ffffffff813005eb <btrfs_tree_read_lock+0xfb> ffffffff8130058c <btrfs_tree_read_lock+0x9c> mov %r14,%rdi ffffffff8130058f <btrfs_tree_read_lock+0x9f> callq ffffffff815f73f0 <_raw_read_unlock> ffffffff81300594 <btrfs_tree_read_lock+0xa4> mov (%rbx),%eax ffffffff81300596 <btrfs_tree_read_lock+0xa6> test %eax,%eax ffffffff81300598 <btrfs_tree_read_lock+0xa8> je ffffffff81300530 <btrfs_tree_read_lock+0x40> ffffffff8130059a <btrfs_tree_read_lock+0xaa> mov -0x78(%rbp),%rax ffffffff8130059e <btrfs_tree_read_lock+0xae> movq $0x0,-0x60(%rbp) ffffffff813005a6 <btrfs_tree_read_lock+0xb6> movq $0xffffffff8105d950,-0x50(%rbp) ffffffff813005ae <btrfs_tree_read_lock+0xbe> mov %r15,-0x48(%rbp) ffffffff813005b2 <btrfs_tree_read_lock+0xc2> mov %r15,-0x40(%rbp) ffffffff813005b6 <btrfs_tree_read_lock+0xc6> mov %rax,-0x58(%rbp) ffffffff813005ba <btrfs_tree_read_lock+0xca> jmp ffffffff813005c5 <btrfs_tree_read_lock+0xd5> ffffffff813005bc <btrfs_tree_read_lock+0xcc> nopl 0x0(%rax) ffffffff813005c0 <btrfs_tree_read_lock+0xd0> callq ffffffff815f4670 <schedule> ffffffff813005c5 <btrfs_tree_read_lock+0xd5> mov $0x2,%edx ffffffff813005ca <btrfs_tree_read_lock+0xda> mov %r12,%rsi ffffffff813005cd <btrfs_tree_read_lock+0xdd> mov %r13,%rdi ffffffff813005d0 <btrfs_tree_read_lock+0xe0> callq ffffffff8105db90 <prepare_to_wait> ffffffff813005d5 <btrfs_tree_read_lock+0xe5> mov (%rbx),%eax ffffffff813005d7 <btrfs_tree_read_lock+0xe7> test %eax,%eax ffffffff813005d9 <btrfs_tree_read_lock+0xe9> jne ffffffff813005c0 <btrfs_tree_read_lock+0xd0> ffffffff813005db <btrfs_tree_read_lock+0xeb> mov %r12,%rsi ffffffff813005de <btrfs_tree_read_lock+0xee> mov %r13,%rdi ffffffff813005e1 <btrfs_tree_read_lock+0xf1> callq ffffffff8105da80 <finish_wait> ffffffff813005e6 <btrfs_tree_read_lock+0xf6> jmpq ffffffff81300530 <btrfs_tree_read_lock+0x40> ffffffff813005eb <btrfs_tree_read_lock+0xfb> mov -0x70(%rbp),%rax ffffffff813005ef <btrfs_tree_read_lock+0xff> lock incl 0x58(%rax) ffffffff813005f3 <btrfs_tree_read_lock+0x103> lock incl 0x64(%rax) ffffffff813005f7 <btrfs_tree_read_lock+0x107> add $0x58,%rsp ffffffff813005fb <btrfs_tree_read_lock+0x10b> pop %rbx ffffffff813005fc <btrfs_tree_read_lock+0x10c> pop %r12 ffffffff813005fe <btrfs_tree_read_lock+0x10e> pop %r13 ffffffff81300600 <btrfs_tree_read_lock+0x110> pop %r14 ffffffff81300602 <btrfs_tree_read_lock+0x112> pop %r15 ffffffff81300604 <btrfs_tree_read_lock+0x114> leaveq ffffffff81300605 <btrfs_tree_read_lock+0x115> retq ffffffff81300606 <btrfs_tree_read_lock+0x116> nopw %cs:0x0(%rax,%rax,1) ==The Second Issue (aka "The Busy Looping sync()" case) == The box is different from first, so conditions are a bit different. - it's a _patched_ linus's 3.1.0-rc3-00271-g3230739 (+patch from $SUBJ on top) - /dev/root on / type btrfs (rw,noatime,autodefrag) (note autodefrag!) - 15% full 594GB filesystem (usual nonmixed mode) How I hung it: $ liferea <wait it to calm down. it does a lot of SQLite reads/writes> $ sync Got CPU is 100% loaded Perfectly reproducible. I don't remembed it it exits or hangs. Will send detailed followup today.
On 08/30/2011 03:31 PM, Sergei Trofimovich wrote: > On Tue, 30 Aug 2011 14:02:37 -0400 Josef Bacik <josef@redhat.com> > wrote: > >> On 08/30/2011 12:53 PM, Sergei Trofimovich wrote: >>>> Running 'sync' program after the load does not finish and >>>> eats 100%CPU busy-waiting for something in kernel. >>>> >>>> It's easy to reproduce hang with patch for me. I just run >>>> liferea and sync after it. Without patch I haven't managed >>>> to hang btrfs up. >>> >>> And I think it's another btrfs bug. I've managed to reproduce >>> it _without_ your patch and _without_ autodefrag enabled by >>> manually running the following commands: $ btrfs fi defrag >>> file-with-20_000-extents $ sync >>> >>> I think your patch just shuffles things a bit and forces >>> autodefrag races to pop-up sooner (which is good! :]) >>> >> >> Sergei, can you do sysrq+w when this is happening, and maybe turn >> on the softlockup detector so we can see where sync is getting >> stuck? Thanks, > > Sure. As I keep telling about 2 cases in IRC I will state both here > explicitely: > > ==The First Issue (aka "The Hung sync()" case) == > > - it's an unpatched linus's v3.1-rc4-80-g0f43dd5 - /dev/root on / > type btrfs (rw,noatime,compress=lzo) - 50% full 30GB filesystem > (usual nonmixed mode) > > How I hung it: $ /usr/sbin/filefrag ~/.bogofilter/wordlist.db > /home/st/.bogofilter/wordlist.db: 19070 extents found the file is > 138MB sqlite database for bayesian SPAM filter, it's being read and > written every 20 minutes or so. Maybe, it was writtent even in > defrag/sync time! $~/dev/git/btrfs-progs-unstable/btrfs fi defrag > ~/.bogofilter/wordlist.db $ sync ^C<hung in D-state> > > I didn't try to reproduce it yet. As for lockdep I'll try but I'm > afraid I will fail to reproduce, but I'll try tomorrow. I suspect > I'll need to seriously fragment some file first down to such > horrible state. > > With help of David I've some (hopefully relevant) info: #!/bin/sh > -x > > for i in $(ps aux|grep " D[+ ]\?"|awk '{print $2}'); do ps $i sudo > cat /proc/$i/stack done > > PID TTY STAT TIME COMMAND 1291 ? D 0:00 > [btrfs-endio-wri] [<ffffffff8130055d>] > btrfs_tree_read_lock+0x6d/0x120 [<ffffffff812b8e88>] > btrfs_search_slot+0x698/0x8b0 [<ffffffff812c9e18>] > btrfs_lookup_csum+0x68/0x190 [<ffffffff812ca10f>] > __btrfs_lookup_bio_sums+0x1cf/0x3e0 [<ffffffff812ca371>] > btrfs_lookup_bio_sums+0x11/0x20 [<ffffffff812d6a50>] > btrfs_submit_bio_hook+0x140/0x170 [<ffffffff812ed594>] > submit_one_bio+0x64/0xa0 [<ffffffff812f14f5>] > extent_readpages+0xe5/0x100 [<ffffffff812d7aaa>] > btrfs_readpages+0x1a/0x20 [<ffffffff810a6a02>] > __do_page_cache_readahead+0x1d2/0x280 [<ffffffff810a6d8c>] > ra_submit+0x1c/0x20 [<ffffffff810a6ebd>] > ondemand_readahead+0x12d/0x270 [<ffffffff810a70cc>] > page_cache_sync_readahead+0x2c/0x40 [<ffffffff81309987>] > __load_free_space_cache+0x1a7/0x5b0 [<ffffffff81309e61>] > load_free_space_cache+0xd1/0x190 [<ffffffff812be07b>] > cache_block_group+0xab/0x290 [<ffffffff812c3def>] > find_free_extent.clone.71+0x39f/0xab0 [<ffffffff812c5160>] > btrfs_reserve_extent+0xe0/0x170 [<ffffffff812c56df>] > btrfs_alloc_free_block+0xcf/0x330 [<ffffffff812b498d>] > __btrfs_cow_block+0x11d/0x4a0 [<ffffffff812b4df8>] > btrfs_cow_block+0xe8/0x1a0 [<ffffffff812b8965>] > btrfs_search_slot+0x175/0x8b0 [<ffffffff812c9e18>] > btrfs_lookup_csum+0x68/0x190 [<ffffffff812caf6e>] > btrfs_csum_file_blocks+0xbe/0x670 [<ffffffff812d7d91>] > add_pending_csums.clone.39+0x41/0x60 [<ffffffff812da528>] > btrfs_finish_ordered_io+0x218/0x310 [<ffffffff812da635>] > btrfs_writepage_end_io_hook+0x15/0x20 [<ffffffff8130c71a>] > end_compressed_bio_write+0x7a/0xe0 [<ffffffff811146f8>] > bio_endio+0x18/0x30 [<ffffffff812cd8fc>] > end_workqueue_fn+0xec/0x120 [<ffffffff812fb0ac>] > worker_loop+0xac/0x520 [<ffffffff8105d486>] kthread+0x96/0xa0 > [<ffffffff815f9214>] kernel_thread_helper+0x4/0x10 > [<ffffffffffffffff>] 0xffffffffffffffff Ok this should have been fixed with Btrfs: use the commit_root for reading free_space_inode crcs which is commit # 2cf8572dac62cc2ff7e995173e95b6c694401b3f. Does your kernel have this commit? Because if it does then we did something wrong. If not it should be in linus's latest tree, so update and it should go away (hopefully). Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/30/2011 03:40 PM, Josef Bacik wrote: > On 08/30/2011 03:31 PM, Sergei Trofimovich wrote: >> On Tue, 30 Aug 2011 14:02:37 -0400 Josef Bacik <josef@redhat.com> >> wrote: >> >>> On 08/30/2011 12:53 PM, Sergei Trofimovich wrote: >>>>> Running 'sync' program after the load does not finish and >>>>> eats 100%CPU busy-waiting for something in kernel. >>>>> >>>>> It's easy to reproduce hang with patch for me. I just run >>>>> liferea and sync after it. Without patch I haven't managed >>>>> to hang btrfs up. >>>> >>>> And I think it's another btrfs bug. I've managed to reproduce >>>> it _without_ your patch and _without_ autodefrag enabled by >>>> manually running the following commands: $ btrfs fi defrag >>>> file-with-20_000-extents $ sync >>>> >>>> I think your patch just shuffles things a bit and forces >>>> autodefrag races to pop-up sooner (which is good! :]) >>>> >>> >>> Sergei, can you do sysrq+w when this is happening, and maybe turn >>> on the softlockup detector so we can see where sync is getting >>> stuck? Thanks, >> >> Sure. As I keep telling about 2 cases in IRC I will state both here >> explicitely: >> >> ==The First Issue (aka "The Hung sync()" case) == >> >> - it's an unpatched linus's v3.1-rc4-80-g0f43dd5 - /dev/root on / >> type btrfs (rw,noatime,compress=lzo) - 50% full 30GB filesystem >> (usual nonmixed mode) >> >> How I hung it: $ /usr/sbin/filefrag ~/.bogofilter/wordlist.db >> /home/st/.bogofilter/wordlist.db: 19070 extents found the file is >> 138MB sqlite database for bayesian SPAM filter, it's being read and >> written every 20 minutes or so. Maybe, it was writtent even in >> defrag/sync time! $~/dev/git/btrfs-progs-unstable/btrfs fi defrag >> ~/.bogofilter/wordlist.db $ sync ^C<hung in D-state> >> >> I didn't try to reproduce it yet. As for lockdep I'll try but I'm >> afraid I will fail to reproduce, but I'll try tomorrow. I suspect >> I'll need to seriously fragment some file first down to such >> horrible state. >> >> With help of David I've some (hopefully relevant) info: #!/bin/sh >> -x >> >> for i in $(ps aux|grep " D[+ ]\?"|awk '{print $2}'); do ps $i sudo >> cat /proc/$i/stack done >> >> PID TTY STAT TIME COMMAND 1291 ? D 0:00 >> [btrfs-endio-wri] [<ffffffff8130055d>] >> btrfs_tree_read_lock+0x6d/0x120 [<ffffffff812b8e88>] >> btrfs_search_slot+0x698/0x8b0 [<ffffffff812c9e18>] >> btrfs_lookup_csum+0x68/0x190 [<ffffffff812ca10f>] >> __btrfs_lookup_bio_sums+0x1cf/0x3e0 [<ffffffff812ca371>] >> btrfs_lookup_bio_sums+0x11/0x20 [<ffffffff812d6a50>] >> btrfs_submit_bio_hook+0x140/0x170 [<ffffffff812ed594>] >> submit_one_bio+0x64/0xa0 [<ffffffff812f14f5>] >> extent_readpages+0xe5/0x100 [<ffffffff812d7aaa>] >> btrfs_readpages+0x1a/0x20 [<ffffffff810a6a02>] >> __do_page_cache_readahead+0x1d2/0x280 [<ffffffff810a6d8c>] >> ra_submit+0x1c/0x20 [<ffffffff810a6ebd>] >> ondemand_readahead+0x12d/0x270 [<ffffffff810a70cc>] >> page_cache_sync_readahead+0x2c/0x40 [<ffffffff81309987>] >> __load_free_space_cache+0x1a7/0x5b0 [<ffffffff81309e61>] >> load_free_space_cache+0xd1/0x190 [<ffffffff812be07b>] >> cache_block_group+0xab/0x290 [<ffffffff812c3def>] >> find_free_extent.clone.71+0x39f/0xab0 [<ffffffff812c5160>] >> btrfs_reserve_extent+0xe0/0x170 [<ffffffff812c56df>] >> btrfs_alloc_free_block+0xcf/0x330 [<ffffffff812b498d>] >> __btrfs_cow_block+0x11d/0x4a0 [<ffffffff812b4df8>] >> btrfs_cow_block+0xe8/0x1a0 [<ffffffff812b8965>] >> btrfs_search_slot+0x175/0x8b0 [<ffffffff812c9e18>] >> btrfs_lookup_csum+0x68/0x190 [<ffffffff812caf6e>] >> btrfs_csum_file_blocks+0xbe/0x670 [<ffffffff812d7d91>] >> add_pending_csums.clone.39+0x41/0x60 [<ffffffff812da528>] >> btrfs_finish_ordered_io+0x218/0x310 [<ffffffff812da635>] >> btrfs_writepage_end_io_hook+0x15/0x20 [<ffffffff8130c71a>] >> end_compressed_bio_write+0x7a/0xe0 [<ffffffff811146f8>] >> bio_endio+0x18/0x30 [<ffffffff812cd8fc>] >> end_workqueue_fn+0xec/0x120 [<ffffffff812fb0ac>] >> worker_loop+0xac/0x520 [<ffffffff8105d486>] kthread+0x96/0xa0 >> [<ffffffff815f9214>] kernel_thread_helper+0x4/0x10 >> [<ffffffffffffffff>] 0xffffffffffffffff > > Ok this should have been fixed with > > Btrfs: use the commit_root for reading free_space_inode crcs > > which is commit # 2cf8572dac62cc2ff7e995173e95b6c694401b3f. Does your > kernel have this commit? Because if it does then we did something > wrong. If not it should be in linus's latest tree, so update and it > should go away (hopefully). Thanks, > Oops looks like that patch won't fix it completely, I just sent another patch that will fix this problem totally, sorry about that [PATCH] Btrfs: skip locking if searching the commit root in lookup_csums Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On 08/30/2011 03:40 PM, Josef Bacik wrote: > > On 08/30/2011 03:31 PM, Sergei Trofimovich wrote: > >> On Tue, 30 Aug 2011 14:02:37 -0400 Josef Bacik <josef@redhat.com> > >> wrote: > >> > >>> On 08/30/2011 12:53 PM, Sergei Trofimovich wrote: > >>>>> Running 'sync' program after the load does not finish and > >>>>> eats 100%CPU busy-waiting for something in kernel. > >>>>> > >>>>> It's easy to reproduce hang with patch for me. I just run > >>>>> liferea and sync after it. Without patch I haven't managed > >>>>> to hang btrfs up. > >>>> > >>>> And I think it's another btrfs bug. I've managed to reproduce > >>>> it _without_ your patch and _without_ autodefrag enabled by > >>>> manually running the following commands: $ btrfs fi defrag > >>>> file-with-20_000-extents $ sync > >>>> > >>>> I think your patch just shuffles things a bit and forces > >>>> autodefrag races to pop-up sooner (which is good! :]) > >>>> > >>> > >>> Sergei, can you do sysrq+w when this is happening, and maybe turn > >>> on the softlockup detector so we can see where sync is getting > >>> stuck? Thanks, > >> > >> Sure. As I keep telling about 2 cases in IRC I will state both here > >> explicitely: > >> > >> ==The First Issue (aka "The Hung sync()" case) == > >> > >> - it's an unpatched linus's v3.1-rc4-80-g0f43dd5 - /dev/root on / > >> type btrfs (rw,noatime,compress=lzo) - 50% full 30GB filesystem > >> (usual nonmixed mode) > >> > >> How I hung it: $ /usr/sbin/filefrag ~/.bogofilter/wordlist.db > >> /home/st/.bogofilter/wordlist.db: 19070 extents found the file is > >> 138MB sqlite database for bayesian SPAM filter, it's being read and > >> written every 20 minutes or so. Maybe, it was writtent even in > >> defrag/sync time! $~/dev/git/btrfs-progs-unstable/btrfs fi defrag > >> ~/.bogofilter/wordlist.db $ sync ^C<hung in D-state> > >> > >> I didn't try to reproduce it yet. As for lockdep I'll try but I'm > >> afraid I will fail to reproduce, but I'll try tomorrow. I suspect > >> I'll need to seriously fragment some file first down to such > >> horrible state. > >> > >> With help of David I've some (hopefully relevant) info: #!/bin/sh > >> -x > >> > >> for i in $(ps aux|grep " D[+ ]\?"|awk '{print $2}'); do ps $i sudo > >> cat /proc/$i/stack done > >> > >> PID TTY STAT TIME COMMAND 1291 ? D 0:00 > >> [btrfs-endio-wri] [<ffffffff8130055d>] > >> btrfs_tree_read_lock+0x6d/0x120 [<ffffffff812b8e88>] > >> btrfs_search_slot+0x698/0x8b0 [<ffffffff812c9e18>] > >> btrfs_lookup_csum+0x68/0x190 [<ffffffff812ca10f>] > >> __btrfs_lookup_bio_sums+0x1cf/0x3e0 [<ffffffff812ca371>] > >> btrfs_lookup_bio_sums+0x11/0x20 [<ffffffff812d6a50>] > >> btrfs_submit_bio_hook+0x140/0x170 [<ffffffff812ed594>] > >> submit_one_bio+0x64/0xa0 [<ffffffff812f14f5>] > >> extent_readpages+0xe5/0x100 [<ffffffff812d7aaa>] > >> btrfs_readpages+0x1a/0x20 [<ffffffff810a6a02>] > >> __do_page_cache_readahead+0x1d2/0x280 [<ffffffff810a6d8c>] > >> ra_submit+0x1c/0x20 [<ffffffff810a6ebd>] > >> ondemand_readahead+0x12d/0x270 [<ffffffff810a70cc>] > >> page_cache_sync_readahead+0x2c/0x40 [<ffffffff81309987>] > >> __load_free_space_cache+0x1a7/0x5b0 [<ffffffff81309e61>] > >> load_free_space_cache+0xd1/0x190 [<ffffffff812be07b>] > >> cache_block_group+0xab/0x290 [<ffffffff812c3def>] > >> find_free_extent.clone.71+0x39f/0xab0 [<ffffffff812c5160>] > >> btrfs_reserve_extent+0xe0/0x170 [<ffffffff812c56df>] > >> btrfs_alloc_free_block+0xcf/0x330 [<ffffffff812b498d>] > >> __btrfs_cow_block+0x11d/0x4a0 [<ffffffff812b4df8>] > >> btrfs_cow_block+0xe8/0x1a0 [<ffffffff812b8965>] > >> btrfs_search_slot+0x175/0x8b0 [<ffffffff812c9e18>] > >> btrfs_lookup_csum+0x68/0x190 [<ffffffff812caf6e>] > >> btrfs_csum_file_blocks+0xbe/0x670 [<ffffffff812d7d91>] > >> add_pending_csums.clone.39+0x41/0x60 [<ffffffff812da528>] > >> btrfs_finish_ordered_io+0x218/0x310 [<ffffffff812da635>] > >> btrfs_writepage_end_io_hook+0x15/0x20 [<ffffffff8130c71a>] > >> end_compressed_bio_write+0x7a/0xe0 [<ffffffff811146f8>] > >> bio_endio+0x18/0x30 [<ffffffff812cd8fc>] > >> end_workqueue_fn+0xec/0x120 [<ffffffff812fb0ac>] > >> worker_loop+0xac/0x520 [<ffffffff8105d486>] kthread+0x96/0xa0 > >> [<ffffffff815f9214>] kernel_thread_helper+0x4/0x10 > >> [<ffffffffffffffff>] 0xffffffffffffffff > > > > Ok this should have been fixed with > > > > Btrfs: use the commit_root for reading free_space_inode crcs > > > > which is commit # 2cf8572dac62cc2ff7e995173e95b6c694401b3f. Does your > > kernel have this commit? Because if it does then we did something > > wrong. If not it should be in linus's latest tree, so update and it > > should go away (hopefully). Thanks, Yeah, this one was in my local tree when hung. > Oops looks like that patch won't fix it completely, I just sent another > patch that will fix this problem totally, sorry about that > > [PATCH] Btrfs: skip locking if searching the commit root in lookup_csums I'll try to reproduce/test it tomorrow. About the second one: > ==The Second Issue (aka "The Busy Looping sync()" case) == > The box is different from first, so conditions are a bit different. > - /dev/root on / type btrfs (rw,noatime,autodefrag) > (note autodefrag!) > - 15% full 594GB filesystem (usual nonmixed mode) > > $ liferea > <wait it to calm down. it does a lot of SQLite reads/writes> > $ sync > Got CPU is 100% loaded <hung> Still reproducible with 2 patches above + $SUBJ one. strace says it hangs in strace() syscall. Stack trace is odd: # cat /proc/`pidof sync`/stack [<ffffffffffffffff>] 0xffffffffffffffff
> About the second one: > > > ==The Second Issue (aka "The Busy Looping sync()" case) == > > The box is different from first, so conditions are a bit different. > > - /dev/root on / type btrfs (rw,noatime,autodefrag) > > (note autodefrag!) > > - 15% full 594GB filesystem (usual nonmixed mode) > > > > $ liferea > > <wait it to calm down. it does a lot of SQLite reads/writes> > > $ sync > > Got CPU is 100% loaded <hung> > > Still reproducible with 2 patches above + $SUBJ one. strace says it hangs in > strace() syscall. Stack trace is odd: > # cat /proc/`pidof sync`/stack > [<ffffffffffffffff>] 0xffffffffffffffff "got" stack of the looper with such hack: $ while :; do sudo echo "-"; cat /proc/2304/stack | fgrep -v '[<ffffffffffffffff>] 0xffffffffffffffff'; done | uniq [<ffffffff8109d39d>] find_get_pages_tag+0x14d/0x1a0 [<ffffffff81076981>] __lock_acquire+0x5a1/0xbd0 [<ffffffff8109e7bd>] filemap_fdatawait_range+0x9d/0x1b0 [<ffffffff8109e8f2>] filemap_fdatawait+0x22/0x30 [<ffffffff81106a04>] sync_inodes_sb+0x1c4/0x250 [<ffffffff8110b398>] __sync_filesystem+0x78/0x80 [<ffffffff8110b3b7>] sync_one_sb+0x17/0x20 [<ffffffff810e3fbe>] iterate_supers+0x7e/0xe0 [<ffffffff8110b400>] sys_sync+0x40/0x60 [<ffffffff81456afb>] system_call_fastpath+0x16/0x1b - [<ffffffff811069f1>] sync_inodes_sb+0x1b1/0x250 [<ffffffff8110b398>] __sync_filesystem+0x78/0x80 [<ffffffff8110b3b7>] sync_one_sb+0x17/0x20 [<ffffffff810e3fbe>] iterate_supers+0x7e/0xe0 [<ffffffff8110b400>] sys_sync+0x40/0x60 [<ffffffff81456afb>] system_call_fastpath+0x16/0x1b - [<ffffffff811069f1>] sync_inodes_sb+0x1b1/0x250 [<ffffffff8110b398>] __sync_filesystem+0x78/0x80 [<ffffffff8110b3b7>] sync_one_sb+0x17/0x20 [<ffffffff810e3fbe>] iterate_supers+0x7e/0xe0 [<ffffffff8110b400>] sys_sync+0x40/0x60 [<ffffffff81456afb>] system_call_fastpath+0x16/0x1b - [<ffffffff814564b6>] retint_kernel+0x26/0x30 [<ffffffff8109d398>] find_get_pages_tag+0x148/0x1a0 [<ffffffff810a7a60>] pagevec_lookup_tag+0x20/0x30 [<ffffffff8109e7bd>] filemap_fdatawait_range+0x9d/0x1b0 [<ffffffff8109e8f2>] filemap_fdatawait+0x22/0x30 [<ffffffff81106a04>] sync_inodes_sb+0x1c4/0x250 [<ffffffff8110b398>] __sync_filesystem+0x78/0x80 [<ffffffff8110b3b7>] sync_one_sb+0x17/0x20 [<ffffffff810e3fbe>] iterate_supers+0x7e/0xe0 [<ffffffff8110b400>] sys_sync+0x40/0x60 [<ffffffff81456afb>] system_call_fastpath+0x16/0x1b Looks like dirty inode count (or dirty page count) is affected by the $SUBJ patch.
Hi, I saw something different with your patch. After creating and deleting lots of subvolumes followed by an umount, there is one unpleasant interaction with orphans. After first mount attempt, the mount fails with # mount /dev/sdb5 mount: Stale NFS file handle syslog: [ 256.050787] btrfs: could not do orphan cleanup -116 [ 256.962781] btrfs: open_ctree failed the -116 is -ESTALE which your patch adds. Another mount attempt and success, no relevant message in syslog. Reproducible. btrfs_orphan_cleanup btrfs_iget -> fails 2373 inode = btrfs_iget(root->fs_info->sb, &found_key, root, NULL); 2374 if (IS_ERR(inode)) { 2375 ret = PTR_ERR(inode); 2376 goto out; 2377 } ... 2440 out: 2441 if (ret) 2442 printk(KERN_CRIT "btrfs: could not do orphan cleanup %d\n", ret); 2443 btrfs_free_path(path); 2444 return ret; 2445 } I'm concerned about leaving some unprocessed orphans "somewhere". david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 15fceef..3e949bd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3952,7 +3952,6 @@ struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, struct btrfs_root *root, int *new) { struct inode *inode; - int bad_inode = 0; inode = btrfs_iget_locked(s, location->objectid, root); if (!inode) @@ -3968,15 +3967,12 @@ struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, if (new) *new = 1; } else { - bad_inode = 1; + __destroy_inode(inode); + btrfs_destroy_inode(inode); + inode = ERR_PTR(-ESTALE); } } - if (bad_inode) { - iput(inode); - inode = ERR_PTR(-ESTALE); - } - return inode; }
iput() shouldn't be called for inodes in I_NEW state, lets call __destroy_inode() and btrfs_destroy_inode() instead [ 1.871723] WARNING: at fs/inode.c:1309 iput+0x1d9/0x200() [ 1.873722] Modules linked in: [ 1.873722] Pid: 1, comm: swapper Tainted: G W 3.1.0-rc2-zurg #58 [ 1.875722] Call Trace: [ 1.875722] [<ffffffff8113cb99>] ? iput+0x1d9/0x200 [ 1.876722] [<ffffffff81044c3a>] warn_slowpath_common+0x7a/0xb0 [ 1.877722] [<ffffffff81044c85>] warn_slowpath_null+0x15/0x20 [ 1.879722] [<ffffffff8113cb99>] iput+0x1d9/0x200 [ 1.879722] [<ffffffff81295cf4>] btrfs_iget+0x1c4/0x450 [ 1.881721] [<ffffffff812b7e6b>] ? btrfs_tree_read_unlock_blocking+0x3b/0x60 [ 1.882721] [<ffffffff8111769a>] ? kmem_cache_free+0x2a/0x160 [ 1.883721] [<ffffffff812966f3>] btrfs_lookup_dentry+0x413/0x490 [ 1.885721] [<ffffffff8103b1e1>] ? get_parent_ip+0x11/0x50 [ 1.886720] [<ffffffff81296781>] btrfs_lookup+0x11/0x30 [ 1.887720] [<ffffffff8112de50>] d_alloc_and_lookup+0x40/0x80 [ 1.888720] [<ffffffff8113ac10>] ? d_lookup+0x30/0x50 [ 1.889720] [<ffffffff811301a8>] do_lookup+0x288/0x370 [ 1.890720] [<ffffffff8103b1e1>] ? get_parent_ip+0x11/0x50 [ 1.891720] [<ffffffff81132210>] do_last+0xe0/0x910 [ 1.892720] [<ffffffff81132b4d>] path_openat+0xcd/0x3a0 [ 1.893719] [<ffffffff813bab4b>] ? wait_for_xmitr+0x3b/0xa0 [ 1.895719] [<ffffffff8131c50a>] ? put_dec_full+0x5a/0xb0 [ 1.896719] [<ffffffff813babdb>] ? serial8250_console_putchar+0x2b/0x40 [ 1.897719] [<ffffffff81132e7d>] do_filp_open+0x3d/0xa0 [ 1.898719] [<ffffffff8103b1e1>] ? get_parent_ip+0x11/0x50 [ 1.899718] [<ffffffff8103b1e1>] ? get_parent_ip+0x11/0x50 [ 1.900718] [<ffffffff816e80fd>] ? sub_preempt_count+0x9d/0xd0 [ 1.902718] [<ffffffff8112a09d>] open_exec+0x2d/0xf0 [ 1.903718] [<ffffffff8112aaaf>] do_execve_common.isra.32+0x12f/0x340 [ 1.906717] [<ffffffff8112acd6>] do_execve+0x16/0x20 [ 1.907717] [<ffffffff8100af02>] sys_execve+0x42/0x70 [ 1.908717] [<ffffffff816ed968>] kernel_execve+0x68/0xd0 [ 1.909717] [<ffffffff816d828e>] ? run_init_process+0x1e/0x20 [ 1.911717] [<ffffffff816d831e>] init_post+0x8e/0xc0 [ 1.912716] [<ffffffff81cb8c79>] kernel_init+0x13d/0x13d [ 1.913716] [<ffffffff816ed8f4>] kernel_thread_helper+0x4/0x10 [ 1.914716] [<ffffffff81cb8b3c>] ? start_kernel+0x33f/0x33f [ 1.915716] [<ffffffff816ed8f0>] ? gs_change+0xb/0xb Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> --- fs/btrfs/inode.c | 10 +++------- 1 files changed, 3 insertions(+), 7 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html