From patchwork Wed Jan 5 01:04:03 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arlin Davis X-Patchwork-Id: 452551 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p0513Vo9030689 for ; Wed, 5 Jan 2011 01:04:09 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750991Ab1AEBEI (ORCPT ); Tue, 4 Jan 2011 20:04:08 -0500 Received: from mga02.intel.com ([134.134.136.20]:59619 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750954Ab1AEBEH convert rfc822-to-8bit (ORCPT ); Tue, 4 Jan 2011 20:04:07 -0500 Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 04 Jan 2011 17:04:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.60,275,1291622400"; d="scan'208";a="693624956" Received: from orsmsx604.amr.corp.intel.com ([10.22.226.87]) by orsmga001.jf.intel.com with ESMTP; 04 Jan 2011 17:04:06 -0800 Received: from orsmsx506.amr.corp.intel.com ([10.22.226.44]) by orsmsx604.amr.corp.intel.com ([10.250.113.17]) with mapi; Tue, 4 Jan 2011 17:04:05 -0800 From: "Davis, Arlin R" To: linux-rdma , "ofw@lists.openfabrics.org" Date: Tue, 4 Jan 2011 17:04:03 -0800 Subject: [PATCH] uDAPL v2.0: scm: retry socket connect on ECONNREFUSED under heavy load Thread-Topic: [PATCH] uDAPL v2.0: scm: retry socket connect on ECONNREFUSED under heavy load Thread-Index: AcusdHGkt6GmI3d0SpmJedxmKxuPtg== Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter1.kernel.org [140.211.167.41]); Wed, 05 Jan 2011 01:04:29 +0000 (UTC) diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index f82d0ff..b95db30 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -390,6 +390,7 @@ static dp_ib_cm_handle_t dapli_cm_alloc(DAPL_EP *ep_ptr) cm_ptr->msg.ver = htons(DCM_VER); cm_ptr->socket = DAPL_INVALID_SOCKET; + cm_ptr->retry = SCM_CR_RETRY; dapls_cm_acquire(cm_ptr); /* Link EP and CM */ @@ -507,10 +508,11 @@ static void dapli_socket_connected(dp_ib_cm_handle_t cm_ptr, int err) &cm_ptr->addr)->sin_addr), ntohs(((struct sockaddr_in *) &cm_ptr->addr)->sin_port), - err == ETIMEDOUT ? "RETRYING...":"ABORTING"); + (err == ETIMEDOUT || err == ECONNREFUSED) ? + "RETRYING...":"ABORTING"); /* retry a timeout */ - if (err == ETIMEDOUT) { + if ((err == ETIMEDOUT) || (ECONNREFUSED && --cm_ptr->retry)) { closesocket(cm_ptr->socket); cm_ptr->socket = DAPL_INVALID_SOCKET; dapli_socket_connect(cm_ptr->ep, (DAT_IA_ADDRESS_PTR)&cm_ptr->addr, diff --git a/dapl/openib_scm/dapl_ib_util.h b/dapl/openib_scm/dapl_ib_util.h index 4bb1a4a..5f9fb43 100644 --- a/dapl/openib_scm/dapl_ib_util.h +++ b/dapl/openib_scm/dapl_ib_util.h @@ -40,6 +40,7 @@ struct ib_cm_handle DAPL_OS_LOCK lock; int ref_count; int state; + int retry; DAPL_SOCKET socket; struct dapl_hca *hca; struct dapl_sp *sp; @@ -63,6 +64,7 @@ typedef dp_ib_cm_handle_t ib_cm_srvc_handle_t; #define SCM_ACK_RETRY 7 /* 3 bits, 7 * 268ms = 1.8 seconds */ #define SCM_RNR_TIMER 12 /* 5 bits, 12 =.64ms, 28 =163ms, 31 =491ms */ #define SCM_RNR_RETRY 7 /* 3 bits, 7 == infinite */ +#define SCM_CR_RETRY 5 /* retries for busy server, connect refused */ #define SCM_IB_MTU 2048 /* Global routing defaults */