Message ID | Pine.LNX.4.64.1105171613120.30781@cobra.newdream.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 17 May 2011, Sage Weil wrote: > On Tue, 17 May 2011, Jim Schutt wrote: > > Hi Sage, > > > > Sage Weil wrote: > > > Hi Jim, > > > > > This picks up the second resent message. But and OPENING is set.. so how is > > > it that we were in try_write but we didn't pass through the OPENING test in > > > con_work? > > > > > > > > > con_work()/try_write() start again on new message. > > > > > > Note that con_work() did not notice that the connection was marked to > > > > > > be closed/reopened! This is where things went bad. > > > > > > Note the connection state: 8220 = 0x201c, and 0x2000 is OPENING. > > > > [ 2412.643178] libceph: messenger.c:1969 : con_work ffff8801955c0030 > > > > OPENING > > > > [ 2412.643182] libceph: messenger.c:292 : con_close_socket on > > > > ffff8801955c0030 sock ffff8801825a8c00 > > > > [ 2412.643197] libceph: messenger.c:162 : ceph_state_change > > > > ffff8801955c0030 state = 2076 sk_state = 4 > > > > > > Ok, now we see it, but this should have executed before we ever got to > > > try_write! > > > > > > There are a handful of places that drop and re-take con->mutex... I think it > > > has to be one of those. They are: > > > > > > - prepare_connect_authorizer (before calling ->prepare_authorizer callback) > > > - fail_protocol (before calling ->bad_proto callback) > > > - process_connect RESETSESSION case (before calling ->reset callback) > > > - process_message (before calling ->dispatch callback) > > > > > > I think after retaking the lock in each of these cases we need to test for > > > the CLOSED or OPENING bits and, if set, bail out so that con_work can > > > restart. > > > > > > Does that make sense? Can you confirm which case of lock droppage inside > > > con_work is at fault? (Maybe adding a printk at the top of con_work after > > > taking the lock too would make it more clear what's going on; currently it > > > only prints in all the random test_bit cases). > > > > So I've got logs now that show that connection state changes > > across a call to process_message. That seems like it proves > > the concept. > > > > > > > > I'm thinking something like the below... does that seem reasonable? I only > > > did the prepare_authorizer case to start, but the others should be fixed up > > > too. Compile tested only. > > > > > > > Is there any reason not to fix all the cases at once, in try_read? > > It's less code, but far away from where the mutex dropping occurs, > > so we add a comment: > > That covers the process_message and process_connect cases, provided you > also set ret properly, and do the con_close_socket work that the con_work > checks would have done. e.g., I pushed a patch to the msgr_race branch that catches all four cases (I think). Does the fix make sense given what you saw? Thanks! sage > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index e15a82c..7ef9eb0 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -1810,6 +1810,21 @@ static int try_read(struct ceph_connection *con) > more: > dout("try_read tag %d in_base_pos %d\n", (int)con->in_tag, > con->in_base_pos); > + > + /* > + * process_connect and process_message drop and re-take > + * con->mutex. make sure we handle a racing close or reopen. > + */ > + if (test_bit(CLOSED, &con->state)) { > + ret = -EAGAIN; > + goto out; > + } > + if (test_bit(OPENING, &con->state)) { > + con_close_socket(con); > + ret = 0; /* so we continue to try_write */ > + goto out; > + } > + > if (test_bit(CONNECTING, &con->state)) { > if (!test_bit(NEGOTIATING, &con->state)) { > dout("try_read connecting\n"); > > It's a little bit unweildly because we repeat the checks at the top of > con_work, though. It is probably cleaner to always fall back to that case > by doing CLOSED|OPENING bit checks and bailing with -EAGAIN any time we > retake con->mutex, and then checking try_read/write return values like the > patch from yesterday does. That way we only react to the bits in one > place. > > Have you verified that this fixes the problem for you? It makes sense the > race is over ->dispatch, since that is a somewhat long-running call. > > sage > > > > > > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > > index a9bd424..11d6a61 100644 > > --- a/net/ceph/messenger.c > > +++ b/net/ceph/messenger.c > > @@ -1816,10 +1820,19 @@ static int try_read(struct ceph_connection *con) > > dout("try_read start on %p state %lu\n", con, con->state); > > > > more: > > dout("try_read %p state %lu tag %d in_base_pos %d\n", con, > > con->state, (int)con->in_tag, con->in_base_pos); > > + > > + /* process_connect, process_message need to drop con->mutex, > > + * so connection state might have changed on us. Check it, > > + * leave if necessary so new state can be processed. > > + */ > > + if (test_bit(CLOSED, &con->state) || > > + test_bit(OPENING, &con->state)) > > + goto out; > > + > > if (test_bit(CONNECTING, &con->state)) { > > if (!test_bit(NEGOTIATING, &con->state)) { > > dout("try_read %p connecting\n", con); > > ret = read_partial_banner(con); > > if (ret <= 0) > > > > > > What am I missing? > > > > -- Jim > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sage Weil wrote: > On Tue, 17 May 2011, Sage Weil wrote: >> On Tue, 17 May 2011, Jim Schutt wrote: >>> Hi Sage, >>> >>> Sage Weil wrote: >>>> Hi Jim, >>>> This picks up the second resent message. But and OPENING is set.. so how is >>>> it that we were in try_write but we didn't pass through the OPENING test in >>>> con_work? >>>> >>>>>>> con_work()/try_write() start again on new message. >>>>>>> Note that con_work() did not notice that the connection was marked to >>>>>>> be closed/reopened! This is where things went bad. >>>>>>> Note the connection state: 8220 = 0x201c, and 0x2000 is OPENING. >>>>> [ 2412.643178] libceph: messenger.c:1969 : con_work ffff8801955c0030 >>>>> OPENING >>>>> [ 2412.643182] libceph: messenger.c:292 : con_close_socket on >>>>> ffff8801955c0030 sock ffff8801825a8c00 >>>>> [ 2412.643197] libceph: messenger.c:162 : ceph_state_change >>>>> ffff8801955c0030 state = 2076 sk_state = 4 >>>> Ok, now we see it, but this should have executed before we ever got to >>>> try_write! >>>> >>>> There are a handful of places that drop and re-take con->mutex... I think it >>>> has to be one of those. They are: >>>> >>>> - prepare_connect_authorizer (before calling ->prepare_authorizer callback) >>>> - fail_protocol (before calling ->bad_proto callback) >>>> - process_connect RESETSESSION case (before calling ->reset callback) >>>> - process_message (before calling ->dispatch callback) >>>> >>>> I think after retaking the lock in each of these cases we need to test for >>>> the CLOSED or OPENING bits and, if set, bail out so that con_work can >>>> restart. >>>> >>>> Does that make sense? Can you confirm which case of lock droppage inside >>>> con_work is at fault? (Maybe adding a printk at the top of con_work after >>>> taking the lock too would make it more clear what's going on; currently it >>>> only prints in all the random test_bit cases). >>> So I've got logs now that show that connection state changes >>> across a call to process_message. That seems like it proves >>> the concept. >>> >>>> I'm thinking something like the below... does that seem reasonable? I only >>>> did the prepare_authorizer case to start, but the others should be fixed up >>>> too. Compile tested only. >>>> >>> Is there any reason not to fix all the cases at once, in try_read? >>> It's less code, but far away from where the mutex dropping occurs, >>> so we add a comment: >> That covers the process_message and process_connect cases, provided you >> also set ret properly, and do the con_close_socket work that the con_work >> checks would have done. e.g., > > I pushed a patch to the msgr_race branch that catches all four cases (I > think). Does the fix make sense given what you saw? Yes, I think so. In every case I logged I saw that prepare_write_message was getting called at the wrong time - I finally am understanding that happened because reset_connection was able to run when con->mutex was dropped, and it thus resets con->out_msg at the wrong time. I'll test your fix out today and let you know if I see any more of this. -- Jim > > Thanks! > sage > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index e15a82c..7ef9eb0 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1810,6 +1810,21 @@ static int try_read(struct ceph_connection *con) more: dout("try_read tag %d in_base_pos %d\n", (int)con->in_tag, con->in_base_pos); + + /* + * process_connect and process_message drop and re-take + * con->mutex. make sure we handle a racing close or reopen. + */ + if (test_bit(CLOSED, &con->state)) { + ret = -EAGAIN; + goto out; + } + if (test_bit(OPENING, &con->state)) { + con_close_socket(con); + ret = 0; /* so we continue to try_write */ + goto out; + } + if (test_bit(CONNECTING, &con->state)) { if (!test_bit(NEGOTIATING, &con->state)) { dout("try_read connecting\n");