Message ID | BANLkTin+D66OuXUbiRkDBNiLx--8eu2hkw@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 19 May 2011, Simon Tian wrote: > Hi Sage, > > I've test this patch. > With 8 times test, only 1 (the first time) got a thread assert fail.As > back trace 1 showed: Do you have a simple reproducer for this? sage > ========================== back trace 1 ================================== > (gdb) bt > #0 0x000000367fa30265 in raise () from /lib64/libc.so.6 > #1 0x000000367fa31d10 in abort () from /lib64/libc.so.6 > #2 0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() () > from /usr/lib64/libstdc++.so.6 > #3 0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6 > #4 0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6 > #5 0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6 > #6 0x00007ffff7c51a54 in ceph::__ceph_assert_fail > (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1", > file=0x7ffff7ce1431 "client/Client.cc", line=2190, > func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)") > at common/assert.cc:86 > #7 0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50, > onfinish=0x7fffe8000b60) at client/Client.cc:2190 > #8 0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090, > in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0) > at client/Client.cc:2930 > #9 0x00007ffff7b52d2a in Client::handle_caps (this=0x629090, > m=0x631fc0) at client/Client.cc:2711 > #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090, > m=0x631fc0) at client/Client.cc:1444 > #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch > (this=0x628350, m=0x631fc0) at msg/Messenger.h:98 > #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry > (this=0x628350) at msg/SimpleMessenger.cc:352 > #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry > (this=0x6287d8) at msg/SimpleMessenger.h:533 > #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at > ./common/Thread.h:41 > #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0 > #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6 > ============================================================ > > In addition, when writing data for a long time, the write thread will > hang on a pthread_cond_wait, as back trace 2 showed: > ========================== back trace 2 ================================== > Thread 10 (Thread 0x43f87940 (LWP 19986)): > #0 0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib64/libpthread.so.0 > #1 0x00007f147a8e763b in Cond::Wait (this=0x43f86b00, > mutex=@0x66d790) at ./common/Cond.h:46 > #2 0x00007f147a86630d in Client::wait_on_list (this=0x66d430, > ls=@0x674df0) at client/Client.cc:2140 > #3 0x00007f147a8771ff in Client::get_caps (this=0x66d430, > in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848) > at client/Client.cc:1827 > #4 0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510, > offset=235470336, size=512, > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 > times>...) at client/Client.cc:5055 > #5 0x00007f147a886d73 in Client::write (this=0x66d430, fd=10, > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 > times>..., size=512, offset=235470336) at client/Client.cc:5007 > #6 0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 'ÿÿ' <repeats > 33 times>, ".001", 'ÿÿ' <repeats 165 times>..., size=512, > offset=235470336) at libceph.cc:322 > #7 0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30, > offset=235470336, length=512, > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 > times>..., queue=@0x644b60, io=0x6456a0) > at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191 > #8 0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30, > io=0x6456a0, queue=@0x644b60, netfd=5) > at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268 > #9 0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at > /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390 > #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0 > #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6 > ============================================================ > > Seem not very safe with the patch. > > So I applied this patch: > diff --git a/src/client/Client.cc b/src/client/Client.cc > index 7f7fb08..bf0997a 100644 > --- a/src/client/Client.cc > +++ b/src/client/Client.cc > @@ -4934,7 +4934,6 @@ public: > > void Client::sync_write_commit(Inode *in) > { > - client_lock.Lock(); > + int r = client_lock.Lock(); > assert(unsafe_sync_write > 0); > unsafe_sync_write--; > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) > } > > put_inode(in); > + if (r) > client_lock.Unlock(); > } > > int Client::write(int fd, const char *buf, loff_t size, loff_t offset) > > > After some test, back trace 1 didn't appear, but back trace 2 still appear. > Hmm,, not safe also. > > Thx! > Simon > > > > 2011/5/19 Sage Weil <sage@newdream.net>: > > Hi Simon, > > > > On Wed, 18 May 2011, Simon Tian wrote: > > > >> Hi, > >> > >> Could any one give me an answer? > >> > >> My application need to open one file with two clients in different > >> hosts. One for write/read and one for read only. The client could be > >> developed based on libceph or librbd. > >> I tried librbd, exception appear too. > > > > I opened a bug for this, http://tracker.newdream.net/issues/1097. > > > > Can you try the patch below? I this may just be something missed in a > > locking rewrite way back when in a rare code path: > > > > Thanks! > > sage > > > > > > > > diff --git a/src/client/Client.cc b/src/client/Client.cc > > index 7f7fb08..bf0997a 100644 > > --- a/src/client/Client.cc > > +++ b/src/client/Client.cc > > @@ -4934,7 +4934,6 @@ public: > > > > void Client::sync_write_commit(Inode *in) > > { > > - client_lock.Lock(); > > assert(unsafe_sync_write > 0); > > unsafe_sync_write--; > > > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) > > } > > > > put_inode(in); > > - > > - client_lock.Unlock(); > > } > > > > int Client::write(int fd, const char *buf, loff_t size, loff_t offset) > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > >
On Thu, 19 May 2011, Sage Weil wrote: > On Thu, 19 May 2011, Simon Tian wrote: > > Hi Sage, > > > > I've test this patch. > > With 8 times test, only 1 (the first time) got a thread assert fail.As > > back trace 1 showed: > > Do you have a simple reproducer for this? Nevermind, I'm easily triggering this with two cfuse mounts. sage > > sage > > > > ========================== back trace 1 ================================== > > (gdb) bt > > #0 0x000000367fa30265 in raise () from /lib64/libc.so.6 > > #1 0x000000367fa31d10 in abort () from /lib64/libc.so.6 > > #2 0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() () > > from /usr/lib64/libstdc++.so.6 > > #3 0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6 > > #4 0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6 > > #5 0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6 > > #6 0x00007ffff7c51a54 in ceph::__ceph_assert_fail > > (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1", > > file=0x7ffff7ce1431 "client/Client.cc", line=2190, > > func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)") > > at common/assert.cc:86 > > #7 0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50, > > onfinish=0x7fffe8000b60) at client/Client.cc:2190 > > #8 0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090, > > in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0) > > at client/Client.cc:2930 > > #9 0x00007ffff7b52d2a in Client::handle_caps (this=0x629090, > > m=0x631fc0) at client/Client.cc:2711 > > #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090, > > m=0x631fc0) at client/Client.cc:1444 > > #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch > > (this=0x628350, m=0x631fc0) at msg/Messenger.h:98 > > #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry > > (this=0x628350) at msg/SimpleMessenger.cc:352 > > #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry > > (this=0x6287d8) at msg/SimpleMessenger.h:533 > > #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at > > ./common/Thread.h:41 > > #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0 > > #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6 > > ============================================================ > > > > In addition, when writing data for a long time, the write thread will > > hang on a pthread_cond_wait, as back trace 2 showed: > > ========================== back trace 2 ================================== > > Thread 10 (Thread 0x43f87940 (LWP 19986)): > > #0 0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from > > /lib64/libpthread.so.0 > > #1 0x00007f147a8e763b in Cond::Wait (this=0x43f86b00, > > mutex=@0x66d790) at ./common/Cond.h:46 > > #2 0x00007f147a86630d in Client::wait_on_list (this=0x66d430, > > ls=@0x674df0) at client/Client.cc:2140 > > #3 0x00007f147a8771ff in Client::get_caps (this=0x66d430, > > in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848) > > at client/Client.cc:1827 > > #4 0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510, > > offset=235470336, size=512, > > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 > > times>...) at client/Client.cc:5055 > > #5 0x00007f147a886d73 in Client::write (this=0x66d430, fd=10, > > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 > > times>..., size=512, offset=235470336) at client/Client.cc:5007 > > #6 0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 'ÿÿ' <repeats > > 33 times>, ".001", 'ÿÿ' <repeats 165 times>..., size=512, > > offset=235470336) at libceph.cc:322 > > #7 0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30, > > offset=235470336, length=512, > > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 > > times>..., queue=@0x644b60, io=0x6456a0) > > at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191 > > #8 0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30, > > io=0x6456a0, queue=@0x644b60, netfd=5) > > at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268 > > #9 0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at > > /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390 > > #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0 > > #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6 > > ============================================================ > > > > Seem not very safe with the patch. > > > > So I applied this patch: > > diff --git a/src/client/Client.cc b/src/client/Client.cc > > index 7f7fb08..bf0997a 100644 > > --- a/src/client/Client.cc > > +++ b/src/client/Client.cc > > @@ -4934,7 +4934,6 @@ public: > > > > void Client::sync_write_commit(Inode *in) > > { > > - client_lock.Lock(); > > + int r = client_lock.Lock(); > > assert(unsafe_sync_write > 0); > > unsafe_sync_write--; > > > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) > > } > > > > put_inode(in); > > + if (r) > > client_lock.Unlock(); > > } > > > > int Client::write(int fd, const char *buf, loff_t size, loff_t offset) > > > > > > After some test, back trace 1 didn't appear, but back trace 2 still appear. > > Hmm,, not safe also. > > > > Thx! > > Simon > > > > > > > > 2011/5/19 Sage Weil <sage@newdream.net>: > > > Hi Simon, > > > > > > On Wed, 18 May 2011, Simon Tian wrote: > > > > > >> Hi, > > >> > > >> Could any one give me an answer? > > >> > > >> My application need to open one file with two clients in different > > >> hosts. One for write/read and one for read only. The client could be > > >> developed based on libceph or librbd. > > >> I tried librbd, exception appear too. > > > > > > I opened a bug for this, http://tracker.newdream.net/issues/1097. > > > > > > Can you try the patch below? I this may just be something missed in a > > > locking rewrite way back when in a rare code path: > > > > > > Thanks! > > > sage > > > > > > > > > > > > diff --git a/src/client/Client.cc b/src/client/Client.cc > > > index 7f7fb08..bf0997a 100644 > > > --- a/src/client/Client.cc > > > +++ b/src/client/Client.cc > > > @@ -4934,7 +4934,6 @@ public: > > > > > > void Client::sync_write_commit(Inode *in) > > > { > > > - client_lock.Lock(); > > > assert(unsafe_sync_write > 0); > > > unsafe_sync_write--; > > > > > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) > > > } > > > > > > put_inode(in); > > > - > > > - client_lock.Unlock(); > > > } > > > > > > int Client::write(int fd, const char *buf, loff_t size, loff_t offset) > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > >
On Thu, 19 May 2011, Sage Weil wrote: > On Thu, 19 May 2011, Sage Weil wrote: > > On Thu, 19 May 2011, Simon Tian wrote: > > > Hi Sage, > > > > > > I've test this patch. > > > With 8 times test, only 1 (the first time) got a thread assert fail.As > > > back trace 1 showed: > > > > Do you have a simple reproducer for this? > > Nevermind, I'm easily triggering this with two cfuse mounts. Hi Simon, I just pushed a series of fixes to this code to the master branch that fixes both of these crashes in my testing. Can you try it out? Thanks! sage > > sage > > > > > > sage > > > > > > > ========================== back trace 1 ================================== > > > (gdb) bt > > > #0 0x000000367fa30265 in raise () from /lib64/libc.so.6 > > > #1 0x000000367fa31d10 in abort () from /lib64/libc.so.6 > > > #2 0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() () > > > from /usr/lib64/libstdc++.so.6 > > > #3 0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6 > > > #4 0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6 > > > #5 0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6 > > > #6 0x00007ffff7c51a54 in ceph::__ceph_assert_fail > > > (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1", > > > file=0x7ffff7ce1431 "client/Client.cc", line=2190, > > > func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)") > > > at common/assert.cc:86 > > > #7 0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50, > > > onfinish=0x7fffe8000b60) at client/Client.cc:2190 > > > #8 0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090, > > > in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0) > > > at client/Client.cc:2930 > > > #9 0x00007ffff7b52d2a in Client::handle_caps (this=0x629090, > > > m=0x631fc0) at client/Client.cc:2711 > > > #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090, > > > m=0x631fc0) at client/Client.cc:1444 > > > #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch > > > (this=0x628350, m=0x631fc0) at msg/Messenger.h:98 > > > #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry > > > (this=0x628350) at msg/SimpleMessenger.cc:352 > > > #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry > > > (this=0x6287d8) at msg/SimpleMessenger.h:533 > > > #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at > > > ./common/Thread.h:41 > > > #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0 > > > #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6 > > > ============================================================ > > > > > > In addition, when writing data for a long time, the write thread will > > > hang on a pthread_cond_wait, as back trace 2 showed: > > > ========================== back trace 2 ================================== > > > Thread 10 (Thread 0x43f87940 (LWP 19986)): > > > #0 0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from > > > /lib64/libpthread.so.0 > > > #1 0x00007f147a8e763b in Cond::Wait (this=0x43f86b00, > > > mutex=@0x66d790) at ./common/Cond.h:46 > > > #2 0x00007f147a86630d in Client::wait_on_list (this=0x66d430, > > > ls=@0x674df0) at client/Client.cc:2140 > > > #3 0x00007f147a8771ff in Client::get_caps (this=0x66d430, > > > in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848) > > > at client/Client.cc:1827 > > > #4 0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510, > > > offset=235470336, size=512, > > > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 > > > times>...) at client/Client.cc:5055 > > > #5 0x00007f147a886d73 in Client::write (this=0x66d430, fd=10, > > > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 > > > times>..., size=512, offset=235470336) at client/Client.cc:5007 > > > #6 0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 'ÿÿ' <repeats > > > 33 times>, ".001", 'ÿÿ' <repeats 165 times>..., size=512, > > > offset=235470336) at libceph.cc:322 > > > #7 0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30, > > > offset=235470336, length=512, > > > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 > > > times>..., queue=@0x644b60, io=0x6456a0) > > > at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191 > > > #8 0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30, > > > io=0x6456a0, queue=@0x644b60, netfd=5) > > > at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268 > > > #9 0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at > > > /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390 > > > #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0 > > > #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6 > > > ============================================================ > > > > > > Seem not very safe with the patch. > > > > > > So I applied this patch: > > > diff --git a/src/client/Client.cc b/src/client/Client.cc > > > index 7f7fb08..bf0997a 100644 > > > --- a/src/client/Client.cc > > > +++ b/src/client/Client.cc > > > @@ -4934,7 +4934,6 @@ public: > > > > > > void Client::sync_write_commit(Inode *in) > > > { > > > - client_lock.Lock(); > > > + int r = client_lock.Lock(); > > > assert(unsafe_sync_write > 0); > > > unsafe_sync_write--; > > > > > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) > > > } > > > > > > put_inode(in); > > > + if (r) > > > client_lock.Unlock(); > > > } > > > > > > int Client::write(int fd, const char *buf, loff_t size, loff_t offset) > > > > > > > > > After some test, back trace 1 didn't appear, but back trace 2 still appear. > > > Hmm,, not safe also. > > > > > > Thx! > > > Simon > > > > > > > > > > > > 2011/5/19 Sage Weil <sage@newdream.net>: > > > > Hi Simon, > > > > > > > > On Wed, 18 May 2011, Simon Tian wrote: > > > > > > > >> Hi, > > > >> > > > >> Could any one give me an answer? > > > >> > > > >> My application need to open one file with two clients in different > > > >> hosts. One for write/read and one for read only. The client could be > > > >> developed based on libceph or librbd. > > > >> I tried librbd, exception appear too. > > > > > > > > I opened a bug for this, http://tracker.newdream.net/issues/1097. > > > > > > > > Can you try the patch below? I this may just be something missed in a > > > > locking rewrite way back when in a rare code path: > > > > > > > > Thanks! > > > > sage > > > > > > > > > > > > > > > > diff --git a/src/client/Client.cc b/src/client/Client.cc > > > > index 7f7fb08..bf0997a 100644 > > > > --- a/src/client/Client.cc > > > > +++ b/src/client/Client.cc > > > > @@ -4934,7 +4934,6 @@ public: > > > > > > > > void Client::sync_write_commit(Inode *in) > > > > { > > > > - client_lock.Lock(); > > > > assert(unsafe_sync_write > 0); > > > > unsafe_sync_write--; > > > > > > > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) > > > > } > > > > > > > > put_inode(in); > > > > - > > > > - client_lock.Unlock(); > > > > } > > > > > > > > int Client::write(int fd, const char *buf, loff_t size, loff_t offset) > > > > > > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > >
OK, I'll test the branch and give the result these days. Thanks very much! Simon 2011/5/20 Sage Weil <sage@newdream.net>: > On Thu, 19 May 2011, Sage Weil wrote: >> On Thu, 19 May 2011, Sage Weil wrote: >> > On Thu, 19 May 2011, Simon Tian wrote: >> > > Hi Sage, >> > > >> > > I've test this patch. >> > > With 8 times test, only 1 (the first time) got a thread assert fail.As >> > > back trace 1 showed: >> > >> > Do you have a simple reproducer for this? >> >> Nevermind, I'm easily triggering this with two cfuse mounts. > > Hi Simon, > > I just pushed a series of fixes to this code to the master branch that > fixes both of these crashes in my testing. Can you try it out? > > Thanks! > sage > > > >> >> sage >> >> >> > >> > sage >> > >> > >> > > ========================== back trace 1 ================================== >> > > (gdb) bt >> > > #0 0x000000367fa30265 in raise () from /lib64/libc.so.6 >> > > #1 0x000000367fa31d10 in abort () from /lib64/libc.so.6 >> > > #2 0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() () >> > > from /usr/lib64/libstdc++.so.6 >> > > #3 0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6 >> > > #4 0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6 >> > > #5 0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6 >> > > #6 0x00007ffff7c51a54 in ceph::__ceph_assert_fail >> > > (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1", >> > > file=0x7ffff7ce1431 "client/Client.cc", line=2190, >> > > func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)") >> > > at common/assert.cc:86 >> > > #7 0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50, >> > > onfinish=0x7fffe8000b60) at client/Client.cc:2190 >> > > #8 0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090, >> > > in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0) >> > > at client/Client.cc:2930 >> > > #9 0x00007ffff7b52d2a in Client::handle_caps (this=0x629090, >> > > m=0x631fc0) at client/Client.cc:2711 >> > > #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090, >> > > m=0x631fc0) at client/Client.cc:1444 >> > > #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch >> > > (this=0x628350, m=0x631fc0) at msg/Messenger.h:98 >> > > #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry >> > > (this=0x628350) at msg/SimpleMessenger.cc:352 >> > > #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry >> > > (this=0x6287d8) at msg/SimpleMessenger.h:533 >> > > #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at >> > > ./common/Thread.h:41 >> > > #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0 >> > > #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6 >> > > ============================================================ >> > > >> > > In addition, when writing data for a long time, the write thread will >> > > hang on a pthread_cond_wait, as back trace 2 showed: >> > > ========================== back trace 2 ================================== >> > > Thread 10 (Thread 0x43f87940 (LWP 19986)): >> > > #0 0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from >> > > /lib64/libpthread.so.0 >> > > #1 0x00007f147a8e763b in Cond::Wait (this=0x43f86b00, >> > > mutex=@0x66d790) at ./common/Cond.h:46 >> > > #2 0x00007f147a86630d in Client::wait_on_list (this=0x66d430, >> > > ls=@0x674df0) at client/Client.cc:2140 >> > > #3 0x00007f147a8771ff in Client::get_caps (this=0x66d430, >> > > in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848) >> > > at client/Client.cc:1827 >> > > #4 0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510, >> > > offset=235470336, size=512, >> > > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 >> > > times>...) at client/Client.cc:5055 >> > > #5 0x00007f147a886d73 in Client::write (this=0x66d430, fd=10, >> > > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 >> > > times>..., size=512, offset=235470336) at client/Client.cc:5007 >> > > #6 0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 'ÿÿ' <repeats >> > > 33 times>, ".001", 'ÿÿ' <repeats 165 times>..., size=512, >> > > offset=235470336) at libceph.cc:322 >> > > #7 0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30, >> > > offset=235470336, length=512, >> > > buf=0x6452b0 'ÿÿ' <repeats 33 times>, ".001", 'ÿÿ' <repeats 165 >> > > times>..., queue=@0x644b60, io=0x6456a0) >> > > at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191 >> > > #8 0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30, >> > > io=0x6456a0, queue=@0x644b60, netfd=5) >> > > at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268 >> > > #9 0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at >> > > /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390 >> > > #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0 >> > > #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6 >> > > ============================================================ >> > > >> > > Seem not very safe with the patch. >> > > >> > > So I applied this patch: >> > > diff --git a/src/client/Client.cc b/src/client/Client.cc >> > > index 7f7fb08..bf0997a 100644 >> > > --- a/src/client/Client.cc >> > > +++ b/src/client/Client.cc >> > > @@ -4934,7 +4934,6 @@ public: >> > > >> > > void Client::sync_write_commit(Inode *in) >> > > { >> > > - client_lock.Lock(); >> > > + int r = client_lock.Lock(); >> > > assert(unsafe_sync_write > 0); >> > > unsafe_sync_write--; >> > > >> > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) >> > > } >> > > >> > > put_inode(in); >> > > + if (r) >> > > client_lock.Unlock(); >> > > } >> > > >> > > int Client::write(int fd, const char *buf, loff_t size, loff_t offset) >> > > >> > > >> > > After some test, back trace 1 didn't appear, but back trace 2 still appear. >> > > Hmm,, not safe also. >> > > >> > > Thx! >> > > Simon >> > > >> > > >> > > >> > > 2011/5/19 Sage Weil <sage@newdream.net>: >> > > > Hi Simon, >> > > > >> > > > On Wed, 18 May 2011, Simon Tian wrote: >> > > > >> > > >> Hi, >> > > >> >> > > >> Could any one give me an answer? >> > > >> >> > > >> My application need to open one file with two clients in different >> > > >> hosts. One for write/read and one for read only. The client could be >> > > >> developed based on libceph or librbd. >> > > >> I tried librbd, exception appear too. >> > > > >> > > > I opened a bug for this, http://tracker.newdream.net/issues/1097. >> > > > >> > > > Can you try the patch below? I this may just be something missed in a >> > > > locking rewrite way back when in a rare code path: >> > > > >> > > > Thanks! >> > > > sage >> > > > >> > > > >> > > > >> > > > diff --git a/src/client/Client.cc b/src/client/Client.cc >> > > > index 7f7fb08..bf0997a 100644 >> > > > --- a/src/client/Client.cc >> > > > +++ b/src/client/Client.cc >> > > > @@ -4934,7 +4934,6 @@ public: >> > > > >> > > > void Client::sync_write_commit(Inode *in) >> > > > { >> > > > - client_lock.Lock(); >> > > > assert(unsafe_sync_write > 0); >> > > > unsafe_sync_write--; >> > > > >> > > > @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) >> > > > } >> > > > >> > > > put_inode(in); >> > > > - >> > > > - client_lock.Unlock(); >> > > > } >> > > > >> > > > int Client::write(int fd, const char *buf, loff_t size, loff_t offset) >> > > > >> > > > >> > > -- >> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> > > the body of a message to majordomo@vger.kernel.org >> > > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
========================== back trace 1 ================================== (gdb) bt #0 0x000000367fa30265 in raise () from /lib64/libc.so.6 #1 0x000000367fa31d10 in abort () from /lib64/libc.so.6 #2 0x0000003682ebec44 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #3 0x0000003682ebcdb6 in ?? () from /usr/lib64/libstdc++.so.6 #4 0x0000003682ebcde3 in std::terminate() () from /usr/lib64/libstdc++.so.6 #5 0x0000003682ebceca in __cxa_throw () from /usr/lib64/libstdc++.so.6 #6 0x00007ffff7c51a54 in ceph::__ceph_assert_fail (assertion=0x7ffff7ce31f1 "in->cap_refs[(32 << 8)] == 1", file=0x7ffff7ce1431 "client/Client.cc", line=2190, func=0x7ffff7ce5fc0 "void Client::_flush(Inode*, Context*)") at common/assert.cc:86 #7 0x00007ffff7b39bb0 in Client::_flush (this=0x629090, in=0x630f50, onfinish=0x7fffe8000b60) at client/Client.cc:2190 #8 0x00007ffff7b52426 in Client::handle_cap_grant (this=0x629090, in=0x630f50, mds=0, cap=0x62ee50, m=0x631fc0) at client/Client.cc:2930 #9 0x00007ffff7b52d2a in Client::handle_caps (this=0x629090, m=0x631fc0) at client/Client.cc:2711 #10 0x00007ffff7b560a0 in Client::ms_dispatch (this=0x629090, m=0x631fc0) at client/Client.cc:1444 #11 0x00007ffff7bcaff9 in Messenger::ms_deliver_dispatch (this=0x628350, m=0x631fc0) at msg/Messenger.h:98 #12 0x00007ffff7bb2607 in SimpleMessenger::dispatch_entry (this=0x628350) at msg/SimpleMessenger.cc:352 #13 0x00007ffff7b22641 in SimpleMessenger::DispatchThread::entry (this=0x6287d8) at msg/SimpleMessenger.h:533 #14 0x00007ffff7b5aa04 in Thread::_entry_func (arg=0x6287d8) at ./common/Thread.h:41 #15 0x00000036802064a7 in start_thread () from /lib64/libpthread.so.0 #16 0x000000367fad3c2d in clone () from /lib64/libc.so.6 ============================================================ In addition, when writing data for a long time, the write thread will hang on a pthread_cond_wait, as back trace 2 showed: ========================== back trace 2 ================================== Thread 10 (Thread 0x43f87940 (LWP 19986)): #0 0x000000312be0ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f147a8e763b in Cond::Wait (this=0x43f86b00, mutex=@0x66d790) at ./common/Cond.h:46 #2 0x00007f147a86630d in Client::wait_on_list (this=0x66d430, ls=@0x674df0) at client/Client.cc:2140 #3 0x00007f147a8771ff in Client::get_caps (this=0x66d430, in=0x6749d0, need=4096, want=8192, got=0x43f86e8c, endoff=235470848) at client/Client.cc:1827 #4 0x00007f147a886003 in Client::_write (this=0x66d430, f=0x671510, offset=235470336, size=512, buf=0x6452b0 '?' <repeats 33 times>, ".001", '?' <repeats 165 times>...) at client/Client.cc:5055 #5 0x00007f147a886d73 in Client::write (this=0x66d430, fd=10, buf=0x6452b0 '?' <repeats 33 times>, ".001", '?' <repeats 165 times>..., size=512, offset=235470336) at client/Client.cc:5007 #6 0x00007f147a85822d in ceph_write (fd=10, buf=0x6452b0 '?' <repeats 33 times>, ".001", '?' <repeats 165 times>..., size=512, offset=235470336) at libceph.cc:322 #7 0x000000000042fa27 in TdcCephImpl::AsyncWrite (this=0x647c30, offset=235470336, length=512, buf=0x6452b0 '?' <repeats 33 times>, ".001", '?' <repeats 165 times>..., queue=@0x644b60, io=0x6456a0) at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:191 #8 0x000000000042ffc5 in TdcCephImpl::AsyncProcess (this=0x647c30, io=0x6456a0, queue=@0x644b60, netfd=5) at /opt/tsk/tdc-tapdisk/td-connector2.0/common/tdc_ceph_impl.cpp:268 #9 0x00000000004192be in SubmitRequest_Intra (arg=0x644b00) at /opt/tsk/tdc-tapdisk/td-connector2.0/td_connector_server.cpp:390 #10 0x000000312be064a7 in start_thread () from /lib64/libpthread.so.0 #11 0x000000312b6d3c2d in clone () from /lib64/libc.so.6 ============================================================ Seem not very safe with the patch. So I applied this patch: diff --git a/src/client/Client.cc b/src/client/Client.cc index 7f7fb08..bf0997a 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -4934,7 +4934,6 @@ public: void Client::sync_write_commit(Inode *in) { - client_lock.Lock(); + int r = client_lock.Lock(); assert(unsafe_sync_write > 0); unsafe_sync_write--; @@ -4947,8 +4946,6 @@ void Client::sync_write_commit(Inode *in) } put_inode(in); + if (r) client_lock.Unlock(); } int Client::write(int fd, const char *buf, loff_t size, loff_t offset)