Trouble with paxos service for large PG count

On 04/02/2013 12:28 PM, Joao Luis wrote:
> Right.  I'll push a patch to bump that sort of output to 30 when I get home.

Thanks - but FWIW, I don't think it's the root cause of my
issue -- more below....
> 
> If you're willing, try reducing the paxos debug level to 0 and let us know
> if those delays in the proposal go away.

When I started debugging why I couldn't start new filesystems, several
days ago, I wasn't using any debugging.  It took me until last Friday
to track the issue into paxos, which is when I turned on its debugging.

Also, I've added the following bits of debug output:

which produces this:

2013-04-02 13:04:22.311743 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-02 13:04:22.426810 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending - JSON formatting, encode took 0.000170
2013-04-02 13:04:22.426874 7ffff7fbe780  5 mon.cs31@0(leader).paxos(paxos active c 1..3) queue_proposal bl 19775198 bytes; ctx = 0x11e81f0
2013-04-02 13:04:22.426999 7ffff7fbe780  5 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued 4 19775198 bytes
2013-04-02 13:04:22.427003 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued list_proposals 1 in queue:
2013-04-02 13:07:51.190312 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) begin for 4 19775198 bytes
2013-04-02 13:07:51.190860 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos updating c 1..3) begin - JSON formatting, debug_tx decode took 0.000002
2013-04-02 13:07:51.402134 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos updating c 1..3) begin - get_store()->apply_transaction took 0.211275
2013-04-02 13:07:51.402178 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos updating c 1..3) commit 4
2013-04-02 13:07:51.635113 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finishing proposal
2013-04-02 13:07:51.635152 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finish it (proposal = 0x1261340)
2013-04-02 13:07:51.635159 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal proposal took 209.208162 to finish
2013-04-02 13:07:51.635193 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) _active

I thought I was timing the same JSON formatting in propose_pending()
and begin(), but they take much different amounts of time.
I don't understand why yet.....

Also, I just noticed the elapsed time between these two lines:

2013-04-02 13:04:22.427003 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued list_proposals 1 in queue:
2013-04-02 13:07:51.190312 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) begin for 4 19775198 bytes

It accounts for nearly the whole of the delay that finish_queued_proposal()
is reporting.  Why did begin() take so long to start, I wonder ???

-- Jim

> 
>   -Joao

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Trouble with paxos service for large PG count

Commit Message

Comments

Patch