From patchwork Thu Mar 3 20:40:08 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sage Weil X-Patchwork-Id: 607381 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p23KcK3E015674 for ; Thu, 3 Mar 2011 20:40:06 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758708Ab1CCUkD (ORCPT ); Thu, 3 Mar 2011 15:40:03 -0500 Received: from cobra.newdream.net ([66.33.216.30]:52167 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758714Ab1CCUj7 (ORCPT ); Thu, 3 Mar 2011 15:39:59 -0500 Received: from localhost.localdomain (ip-66-33-206-8.dreamhost.com [66.33.206.8]) by cobra.newdream.net (Postfix) with ESMTPA id D0420BC950; Thu, 3 Mar 2011 12:41:54 -0800 (PST) From: Sage Weil To: ceph-devel@vger.kernel.org Cc: Sage Weil Subject: [PATCH 6/8] libceph: fix msgr backoff Date: Thu, 3 Mar 2011 12:40:08 -0800 Message-Id: <1299184810-19125-7-git-send-email-sage@newdream.net> X-Mailer: git-send-email 1.7.0 In-Reply-To: <1299184810-19125-1-git-send-email-sage@newdream.net> References: <1299184810-19125-1-git-send-email-sage@newdream.net> Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Thu, 03 Mar 2011 20:40:06 +0000 (UTC) diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index c3011be..eb31e10 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -123,6 +123,7 @@ struct ceph_msg_pos { #define SOCK_CLOSED 11 /* socket state changed to closed */ #define OPENING 13 /* open connection w/ (possibly new) peer */ #define DEAD 14 /* dead, about to kfree */ +#define BACKOFF 15 /* * A single connection with another host. diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 6bd5025..af1ea93 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1948,6 +1948,19 @@ static void con_work(struct work_struct *work) struct ceph_connection *con = container_of(work, struct ceph_connection, work.work); + if (test_and_clear_bit(BACKOFF, &con->state)) { + dout("con_work %p backing off\n", con); + if (queue_delayed_work(ceph_msgr_wq, &con->work, + round_jiffies_relative(con->delay))) { + dout("con_work %p backoff %lu\n", con, con->delay); + return; + } else { + con->ops->put(con); + dout("con_work %p fAILED to back off %lu\n", con, + con->delay); + } + } + mutex_lock(&con->mutex); if (test_bit(CLOSED, &con->state)) { /* e.g. if we are replaced */ @@ -2017,11 +2030,24 @@ static void ceph_fault(struct ceph_connection *con) con->delay = BASE_DELAY_INTERVAL; else if (con->delay < MAX_DELAY_INTERVAL) con->delay *= 2; - dout("fault queueing %p delay %lu\n", con, con->delay); con->ops->get(con); if (queue_delayed_work(ceph_msgr_wq, &con->work, - round_jiffies_relative(con->delay)) == 0) + round_jiffies_relative(con->delay))) { + dout("fault queued %p delay %lu\n", con, con->delay); + } else { con->ops->put(con); + dout("fault failed to queue %p delay %lu, backoff\n", + con, con->delay); + /* + * In many cases we see a socket state change + * while con_work is running and end up + * queuing (non-delayed) work, such that we + * can't backoff with a delay. Set a flag so + * that when con_work restarts we schedule the + * delay then. + */ + set_bit(BACKOFF, &con->state); + } } out_unlock: