From patchwork Tue May 17 23:27:19 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sage Weil X-Patchwork-Id: 792702 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.3) with ESMTP id p4HNQHrZ023225 for ; Tue, 17 May 2011 23:26:18 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756346Ab1EQX0Q (ORCPT ); Tue, 17 May 2011 19:26:16 -0400 Received: from cobra.newdream.net ([66.33.216.30]:47549 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751984Ab1EQX0Q (ORCPT ); Tue, 17 May 2011 19:26:16 -0400 Received: from cobra.newdream.net (localhost [127.0.0.1]) by cobra.newdream.net (Postfix) with ESMTP id 737ECBC6C4; Tue, 17 May 2011 16:27:20 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=newdream.net; h=date:from:to:cc :subject:in-reply-to:message-id:references:mime-version: content-type; q=dns; s=newdream.net; b=xhA1zm0Y0e8IhZSm7P8FS1A6m yv6tTX3KNqBENNQOhyR/eYnCQIOfetn+EqMGNiJ8cBOWKXOh5zM5rRlHExeGAqxs FvDJSB6MhXHNwMm0Rl9XY+MgnH0h+2LFgRROu475jkpRMoshdGvn7UCAwpZOp9X2 RPQE4iTeif7POcEYfQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=newdream.net; h=date:from :to:cc:subject:in-reply-to:message-id:references:mime-version: content-type; s=newdream.net; bh=8PoeJt/P7hC6tfZBj2G1tlUxyfQ=; b=Jd2p3mP6y0W7hYlJ6CHaBmE2W/Esye3yhCBNXeE78JqE82pu20hYFmy8kj6QA 4RdMuKEbLHdlgUKu3scmA/eTuOMfATc3JLVknocSzvyDGq0Cn860ajksDHDivGxk z6ZwtlXVckyEO5iQuFDc9qKNDdrTwDTM5hj3f0F8+BSNAs= Received: by cobra.newdream.net (Postfix, from userid 1031) id 4A1BEBC7E4; Tue, 17 May 2011 16:27:20 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by cobra.newdream.net (Postfix) with ESMTP id 38241BC6C4; Tue, 17 May 2011 16:27:20 -0700 (PDT) Date: Tue, 17 May 2011 16:27:19 -0700 (PDT) From: Sage Weil To: Jim Schutt cc: ceph-devel@vger.kernel.org Subject: Re: [PATCH 2/2] libceph: fix handle_timeout() racing with con_work()/try_write() In-Reply-To: <4DD2F773.30508@sandia.gov> Message-ID: References: <1305235954-9860-2-git-send-email-jaschut@sandia.gov> <4DD2F773.30508@sandia.gov> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Tue, 17 May 2011 23:26:18 +0000 (UTC) On Tue, 17 May 2011, Jim Schutt wrote: > Hi Sage, > > Sage Weil wrote: > > Hi Jim, > > > This picks up the second resent message. But and OPENING is set.. so how is > > it that we were in try_write but we didn't pass through the OPENING test in > > con_work? > > > > > > > con_work()/try_write() start again on new message. > > > > > Note that con_work() did not notice that the connection was marked to > > > > > be closed/reopened! This is where things went bad. > > > > > Note the connection state: 8220 = 0x201c, and 0x2000 is OPENING. > > > [ 2412.643178] libceph: messenger.c:1969 : con_work ffff8801955c0030 > > > OPENING > > > [ 2412.643182] libceph: messenger.c:292 : con_close_socket on > > > ffff8801955c0030 sock ffff8801825a8c00 > > > [ 2412.643197] libceph: messenger.c:162 : ceph_state_change > > > ffff8801955c0030 state = 2076 sk_state = 4 > > > > Ok, now we see it, but this should have executed before we ever got to > > try_write! > > > > There are a handful of places that drop and re-take con->mutex... I think it > > has to be one of those. They are: > > > > - prepare_connect_authorizer (before calling ->prepare_authorizer callback) > > - fail_protocol (before calling ->bad_proto callback) > > - process_connect RESETSESSION case (before calling ->reset callback) > > - process_message (before calling ->dispatch callback) > > > > I think after retaking the lock in each of these cases we need to test for > > the CLOSED or OPENING bits and, if set, bail out so that con_work can > > restart. > > > > Does that make sense? Can you confirm which case of lock droppage inside > > con_work is at fault? (Maybe adding a printk at the top of con_work after > > taking the lock too would make it more clear what's going on; currently it > > only prints in all the random test_bit cases). > > So I've got logs now that show that connection state changes > across a call to process_message. That seems like it proves > the concept. > > > > > I'm thinking something like the below... does that seem reasonable? I only > > did the prepare_authorizer case to start, but the others should be fixed up > > too. Compile tested only. > > > > Is there any reason not to fix all the cases at once, in try_read? > It's less code, but far away from where the mutex dropping occurs, > so we add a comment: That covers the process_message and process_connect cases, provided you also set ret properly, and do the con_close_socket work that the con_work checks would have done. e.g., It's a little bit unweildly because we repeat the checks at the top of con_work, though. It is probably cleaner to always fall back to that case by doing CLOSED|OPENING bit checks and bailing with -EAGAIN any time we retake con->mutex, and then checking try_read/write return values like the patch from yesterday does. That way we only react to the bits in one place. Have you verified that this fixes the problem for you? It makes sense the race is over ->dispatch, since that is a somewhat long-running call. sage > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index a9bd424..11d6a61 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -1816,10 +1820,19 @@ static int try_read(struct ceph_connection *con) > dout("try_read start on %p state %lu\n", con, con->state); > > more: > dout("try_read %p state %lu tag %d in_base_pos %d\n", con, > con->state, (int)con->in_tag, con->in_base_pos); > + > + /* process_connect, process_message need to drop con->mutex, > + * so connection state might have changed on us. Check it, > + * leave if necessary so new state can be processed. > + */ > + if (test_bit(CLOSED, &con->state) || > + test_bit(OPENING, &con->state)) > + goto out; > + > if (test_bit(CONNECTING, &con->state)) { > if (!test_bit(NEGOTIATING, &con->state)) { > dout("try_read %p connecting\n", con); > ret = read_partial_banner(con); > if (ret <= 0) > > > What am I missing? > > -- Jim > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > --- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index e15a82c..7ef9eb0 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1810,6 +1810,21 @@ static int try_read(struct ceph_connection *con) more: dout("try_read tag %d in_base_pos %d\n", (int)con->in_tag, con->in_base_pos); + + /* + * process_connect and process_message drop and re-take + * con->mutex. make sure we handle a racing close or reopen. + */ + if (test_bit(CLOSED, &con->state)) { + ret = -EAGAIN; + goto out; + } + if (test_bit(OPENING, &con->state)) { + con_close_socket(con); + ret = 0; /* so we continue to try_write */ + goto out; + } + if (test_bit(CONNECTING, &con->state)) { if (!test_bit(NEGOTIATING, &con->state)) { dout("try_read connecting\n");