From patchwork Sun Oct 14 18:55:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10640779 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D83C013AD for ; Sun, 14 Oct 2018 18:55:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9084628B9F for ; Sun, 14 Oct 2018 18:55:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8260D291F2; Sun, 14 Oct 2018 18:55:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0861828B9F for ; Sun, 14 Oct 2018 18:55:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CD28921F5E8; Sun, 14 Oct 2018 11:55:39 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 2556821F2FA for ; Sun, 14 Oct 2018 11:55:38 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id D2E8E2238; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CEAFA2AC; Sun, 14 Oct 2018 14:55:34 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 14 Oct 2018 14:55:28 -0400 Message-Id: <1539543332-28679-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> References: <1539543332-28679-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/10] lustre: lnd: calculate qp max_send_wrs properly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Lyashkov , Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata The maximum in-flight transfers can not exceed the negotiated queue depth. Instead of calculating the max_send_wrs to be the negotiated number of frags * concurrent sends, it should be the negotiated number of frags * queue depth. If that value is too large for successful qp creation then we reduce the queue depth in a loop until we successfully create the qp or the queue depth dips below 2. Due to the queue depth negotiation protocol it is guaranteed that the queue depth on both the active and the passive will match. This change resolves the discrepancy created by the previous code which reduces max_send_wr by a quarter. That could lead to: mlx5_ib_post_send:4184:(pid 26272): Failed to prepare WQE When the o2iblnd transfers a message which requires more WRs than the max that has been allocated. Signed-off-by: Amir Shehata Signed-off-by: Alexey Lyashkov WC-bug-id: https://jira.whamcloud.com/browse/LU-10213 Reviewed-on: https://review.whamcloud.com/30310 Reviewed-by: Alexey Lyashkov Reviewed-by: Dmitry Eremin Reviewed-by: Doug Oucharek Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 30 +++++++++++++++++----- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 4 +-- 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 99a4650..43266d8 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -650,6 +650,19 @@ static struct kib_sched_info *kiblnd_get_scheduler(int cpt) return NULL; } +static unsigned int kiblnd_send_wrs(struct kib_conn *conn) +{ + /* + * One WR for the LNet message + * And ibc_max_frags for the transfer WRs + */ + unsigned int ret = 1 + conn->ibc_max_frags; + + /* account for a maximum of ibc_queue_depth in-flight transfers */ + ret *= conn->ibc_queue_depth; + return ret; +} + struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, struct rdma_cm_id *cmid, int state, int version) @@ -801,8 +814,6 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, init_qp_attr->event_handler = kiblnd_qp_event; init_qp_attr->qp_context = conn; - init_qp_attr->cap.max_send_wr = IBLND_SEND_WRS(conn); - init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(conn); init_qp_attr->cap.max_send_sge = *kiblnd_tunables.kib_wrq_sge; init_qp_attr->cap.max_recv_sge = 1; init_qp_attr->sq_sig_type = IB_SIGNAL_REQ_WR; @@ -813,11 +824,14 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, conn->ibc_sched = sched; do { + init_qp_attr->cap.max_send_wr = kiblnd_send_wrs(conn); + init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(conn); + rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr); - if (!rc || init_qp_attr->cap.max_send_wr < 16) + if (!rc || conn->ibc_queue_depth < 2) break; - init_qp_attr->cap.max_send_wr -= init_qp_attr->cap.max_send_wr / 4; + conn->ibc_queue_depth--; } while (rc); if (rc) { @@ -829,9 +843,11 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, goto failed_2; } - if (init_qp_attr->cap.max_send_wr != IBLND_SEND_WRS(conn)) - CDEBUG(D_NET, "original send wr %d, created with %d\n", - IBLND_SEND_WRS(conn), init_qp_attr->cap.max_send_wr); + if (conn->ibc_queue_depth != peer_ni->ibp_queue_depth) + CWARN("peer %s - queue depth reduced from %u to %u to allow for qp creation\n", + libcfs_nid2str(peer_ni->ibp_nid), + peer_ni->ibp_queue_depth, + conn->ibc_queue_depth); kfree(init_qp_attr); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index cd64cfb..c6c8106 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -139,9 +139,7 @@ struct kib_tunables { /* WRs and CQEs (per connection) */ #define IBLND_RECV_WRS(c) IBLND_RX_MSGS(c) -#define IBLND_SEND_WRS(c) \ - ((c->ibc_max_frags + 1) * kiblnd_concurrent_sends(c->ibc_version, \ - c->ibc_peer->ibp_ni)) + #define IBLND_CQ_ENTRIES(c) \ (IBLND_RECV_WRS(c) + 2 * kiblnd_concurrent_sends(c->ibc_version, \ c->ibc_peer->ibp_ni))