Message ID | 515B2E8C.5090001@sandia.gov (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/02/2013 01:16 PM, Jim Schutt wrote: > On 04/02/2013 12:28 PM, Joao Luis wrote: >> Right. I'll push a patch to bump that sort of output to 30 when I get home. > > Thanks - but FWIW, I don't think it's the root cause of my > issue -- more below.... OK, I see now that you're talking about the output from list_proposals(). Indeed, when I cause list_proposals() in propose_queued() to only be called at paxos debug 30, the stall in finish_queued_proposal() mostly goes away. Now, with 55392 PGs I'm seeing this: 2013-04-02 14:21:19.401820 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending 2013-04-02 14:21:19.535731 7ffff328d700 5 mon.cs31@0(leader).paxos(paxos active c 1..33) queue_proposal bl 19870402 bytes; ctx = 0x11ecff0 2013-04-02 14:21:19.535917 7ffff328d700 5 mon.cs31@0(leader).paxos(paxos preparing update c 1..33) propose_queued 34 19870402 bytes 2013-04-02 14:21:19.535925 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..33) begin for 34 19870402 bytes 2013-04-02 14:21:19.754326 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos updating c 1..33) commit 34 2013-04-02 14:21:19.975747 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..34) finish_queued_proposal finishing proposal 2013-04-02 14:21:19.975799 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..34) finish_queued_proposal finish it (proposal = 0x790c0c0) 2013-04-02 14:21:19.975802 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..34) finish_queued_proposal proposal took 0.439887 to finish 2013-04-02 14:21:19.975913 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) _active >> >> If you're willing, try reducing the paxos debug level to 0 and let us know >> if those delays in the proposal go away. > > When I started debugging why I couldn't start new filesystems, several > days ago, I wasn't using any debugging. It took me until last Friday > to track the issue into paxos, which is when I turned on its debugging. > With that list_proposals() debug output issue out of the way, I'm still having trouble starting up a new filesystem -- still working to try to isolate the source of the issue.... Thanks -- Jim -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/02/2013 09:36 PM, Jim Schutt wrote: > On 04/02/2013 01:16 PM, Jim Schutt wrote: >> On 04/02/2013 12:28 PM, Joao Luis wrote: >>> Right. I'll push a patch to bump that sort of output to 30 when I get home. >> >> Thanks - but FWIW, I don't think it's the root cause of my >> issue -- more below.... > > OK, I see now that you're talking about the output from list_proposals(). > Indeed, when I cause list_proposals() in propose_queued() to only > be called at paxos debug 30, the stall in finish_queued_proposal() > mostly goes away. Now, with 55392 PGs I'm seeing this: > > 2013-04-02 14:21:19.401820 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending > 2013-04-02 14:21:19.535731 7ffff328d700 5 mon.cs31@0(leader).paxos(paxos active c 1..33) queue_proposal bl 19870402 bytes; ctx = 0x11ecff0 > 2013-04-02 14:21:19.535917 7ffff328d700 5 mon.cs31@0(leader).paxos(paxos preparing update c 1..33) propose_queued 34 19870402 bytes > 2013-04-02 14:21:19.535925 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..33) begin for 34 19870402 bytes > 2013-04-02 14:21:19.754326 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos updating c 1..33) commit 34 > 2013-04-02 14:21:19.975747 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..34) finish_queued_proposal finishing proposal > 2013-04-02 14:21:19.975799 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..34) finish_queued_proposal finish it (proposal = 0x790c0c0) > 2013-04-02 14:21:19.975802 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..34) finish_queued_proposal proposal took 0.439887 to finish > 2013-04-02 14:21:19.975913 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) _active > >>> >>> If you're willing, try reducing the paxos debug level to 0 and let us know >>> if those delays in the proposal go away. >> >> When I started debugging why I couldn't start new filesystems, several >> days ago, I wasn't using any debugging. It took me until last Friday >> to track the issue into paxos, which is when I turned on its debugging. >> > > With that list_proposals() debug output issue out of the way, I'm > still having trouble starting up a new filesystem -- still working > to try to isolate the source of the issue.... > > Thanks -- Jim > > Good to know! With regard to your troubles, what appears to go wrong? -Joao -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/mon/Paxos.cc b/src/mon/Paxos.cc index bf3eb92..70a8b5d 100644 --- a/src/mon/Paxos.cc +++ b/src/mon/Paxos.cc @@ -494,6 +494,8 @@ void Paxos::begin(bufferlist& v) MonitorDBStore::Transaction t; t.put(get_name(), last_committed+1, new_value); + utime_t tstart = ceph_clock_now(g_ceph_context); + dout(30) << __func__ << " transaction dump:\n"; JSONFormatter f(true); t.dump(&f); @@ -506,7 +508,14 @@ void Paxos::begin(bufferlist& v) f.flush(*_dout); *_dout << dendl; + dout(10) << __func__ << " - JSON formatting, debug_tx decode took " + << ceph_clock_now(g_ceph_context) - tstart << dendl; + + tstart = ceph_clock_now(g_ceph_context); get_store()->apply_transaction(t); + dout(10) << __func__ << " - get_store()->apply_transaction took " + << ceph_clock_now(g_ceph_context) - tstart << dendl; + if (mon->get_quorum().size() == 1) { // we're alone, take it easy diff --git a/src/mon/PaxosService.cc b/src/mon/PaxosService.cc index a66c5ec..808ce62 100644 --- a/src/mon/PaxosService.cc +++ b/src/mon/PaxosService.cc @@ -158,6 +158,8 @@ void PaxosService::propose_pending() encode_pending(&t); have_pending = false; + utime_t tstart = ceph_clock_now(g_ceph_context); + dout(30) << __func__ << " transaction dump:\n"; JSONFormatter f(true); t.dump(&f); @@ -166,6 +168,9 @@ void PaxosService::propose_pending() t.encode(bl); + dout(10) << __func__ << " - JSON formatting, encode took " + << ceph_clock_now(g_ceph_context) - tstart << dendl; + // apply to paxos proposing.set(1); paxos->propose_new_value(bl, new C_Committed(this));