From patchwork Wed Sep 26 02:47:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615109 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 996DE161F for ; Wed, 26 Sep 2018 02:48:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8657B2A6D9 for ; Wed, 26 Sep 2018 02:48:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7999C2A81A; Wed, 26 Sep 2018 02:48:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EBAC52A6D9 for ; Wed, 26 Sep 2018 02:48:29 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3ED7121F585; Tue, 25 Sep 2018 19:48:27 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 020CC21F499 for ; Tue, 25 Sep 2018 19:48:20 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 30BF01005367; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 25B7D2DC; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:47:53 -0400 Message-Id: <1537930097-11624-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/25] lustre: lnet: remove ni from lnet_finalize X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sergey Cheremencev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Sergey Cheremencev Remove ni from lnet_finalize and kiblnd_txlist_done input arguments. Also small code cleanup by introducing ibprm_cookie to avoid checkpatch issues. Signed-off-by: Sergey Cheremencev WC-bug-id: https://jira.whamcloud.com/browse/LU-9094 Seagate-bug-id: MRP-4056 Reviewed-on: https://review.whamcloud.com/25375 Reviewed-by: Doug Oucharek Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 2 +- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 2 +- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 3 +- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 65 ++++++++++------------ .../staging/lustre/lnet/klnds/socklnd/socklnd.c | 3 +- .../staging/lustre/lnet/klnds/socklnd/socklnd_cb.c | 4 +- drivers/staging/lustre/lnet/lnet/lib-move.c | 20 +++---- drivers/staging/lustre/lnet/lnet/lib-msg.c | 2 +- drivers/staging/lustre/lnet/lnet/lo.c | 4 +- drivers/staging/lustre/lnet/lnet/net_fault.c | 2 +- 10 files changed, 50 insertions(+), 57 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index aedc88c..53cbf6d 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -537,7 +537,7 @@ struct lnet_msg *lnet_create_reply_msg(struct lnet_ni *ni, void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg, unsigned int len); -void lnet_finalize(struct lnet_ni *ni, struct lnet_msg *msg, int rc); +void lnet_finalize(struct lnet_msg *msg, int rc); void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private, unsigned int nob); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 75a7e96..b3a4344 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -521,7 +521,7 @@ static int kiblnd_del_peer(struct lnet_ni *ni, lnet_nid_t nid) write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); - kiblnd_txlist_done(ni, &zombies, -EIO); + kiblnd_txlist_done(&zombies, -EIO); return rc; } diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index b1851b5..a3d89ec 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -1034,8 +1034,7 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, void kiblnd_close_conn_locked(struct kib_conn *conn, int error); void kiblnd_launch_tx(struct lnet_ni *ni, struct kib_tx *tx, lnet_nid_t nid); -void kiblnd_txlist_done(struct lnet_ni *ni, struct list_head *txlist, - int status); +void kiblnd_txlist_done(struct list_head *txlist, int status); void kiblnd_qp_event(struct ib_event *event, void *arg); void kiblnd_cq_event(struct ib_event *event, void *arg); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index cb752dc..debed17 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -54,14 +54,12 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, static void kiblnd_check_sends_locked(struct kib_conn *conn); static void -kiblnd_tx_done(struct lnet_ni *ni, struct kib_tx *tx) +kiblnd_tx_done(struct kib_tx *tx) { struct lnet_msg *lntmsg[2]; - struct kib_net *net = ni->ni_data; int rc; int i; - LASSERT(net); LASSERT(!in_interrupt()); LASSERT(!tx->tx_queued); /* mustn't be queued for sending */ LASSERT(!tx->tx_sending); /* mustn't be awaiting sent callback */ @@ -76,8 +74,6 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, rc = tx->tx_status; if (tx->tx_conn) { - LASSERT(ni == tx->tx_conn->ibc_peer->ibp_ni); - kiblnd_conn_decref(tx->tx_conn); tx->tx_conn = NULL; } @@ -92,12 +88,12 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, if (!lntmsg[i]) continue; - lnet_finalize(ni, lntmsg[i], rc); + lnet_finalize(lntmsg[i], rc); } } void -kiblnd_txlist_done(struct lnet_ni *ni, struct list_head *txlist, int status) +kiblnd_txlist_done(struct list_head *txlist, int status) { struct kib_tx *tx; @@ -108,7 +104,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, /* complete now */ tx->tx_waiting = 0; tx->tx_status = status; - kiblnd_tx_done(ni, tx); + kiblnd_tx_done(tx); } } @@ -281,7 +277,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, spin_unlock(&conn->ibc_lock); if (idle) - kiblnd_tx_done(ni, tx); + kiblnd_tx_done(tx); } static void @@ -794,7 +790,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, * posted NOOPs complete */ spin_unlock(&conn->ibc_lock); - kiblnd_tx_done(peer_ni->ibp_ni, tx); + kiblnd_tx_done(tx); spin_lock(&conn->ibc_lock); CDEBUG(D_NET, "%s(%d): redundant or enough NOOP\n", libcfs_nid2str(peer_ni->ibp_nid), @@ -888,7 +884,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, kiblnd_close_conn(conn, rc); if (done) - kiblnd_tx_done(peer_ni->ibp_ni, tx); + kiblnd_tx_done(tx); spin_lock(&conn->ibc_lock); @@ -1007,7 +1003,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, spin_unlock(&conn->ibc_lock); if (idle) - kiblnd_tx_done(conn->ibc_peer->ibp_ni, tx); + kiblnd_tx_done(tx); } static void @@ -1343,7 +1339,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, CWARN("Abort reconnection of %s: %s\n", libcfs_nid2str(peer_ni->ibp_nid), reason); - kiblnd_txlist_done(peer_ni->ibp_ni, &txs, -ECONNABORTED); + kiblnd_txlist_done(&txs, -ECONNABORTED); return false; } @@ -1421,7 +1417,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (tx) { tx->tx_status = -EHOSTUNREACH; tx->tx_waiting = 0; - kiblnd_tx_done(ni, tx); + kiblnd_tx_done(tx); } return; } @@ -1557,7 +1553,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (rc) { CERROR("Can't setup GET sink for %s: %d\n", libcfs_nid2str(target.nid), rc); - kiblnd_tx_done(ni, tx); + kiblnd_tx_done(tx); return -EIO; } @@ -1571,7 +1567,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (!tx->tx_lntmsg[1]) { CERROR("Can't create reply for GET -> %s\n", libcfs_nid2str(target.nid)); - kiblnd_tx_done(ni, tx); + kiblnd_tx_done(tx); return -EIO; } @@ -1606,7 +1602,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (rc) { CERROR("Can't setup PUT src for %s: %d\n", libcfs_nid2str(target.nid), rc); - kiblnd_tx_done(ni, tx); + kiblnd_tx_done(tx); return -EIO; } @@ -1697,7 +1693,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (!nob) { /* No RDMA: local completion may happen now! */ - lnet_finalize(ni, lntmsg, 0); + lnet_finalize(lntmsg, 0); } else { /* RDMA: lnet_finalize(lntmsg) when it completes */ tx->tx_lntmsg[0] = lntmsg; @@ -1707,9 +1703,9 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, return; failed_1: - kiblnd_tx_done(ni, tx); + kiblnd_tx_done(tx); failed_0: - lnet_finalize(ni, lntmsg, -EIO); + lnet_finalize(lntmsg, -EIO); } int @@ -1722,6 +1718,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, struct kib_tx *tx; int nob; int post_credit = IBLND_POSTRX_PEER_CREDIT; + u64 ibprm_cookie; int rc = 0; LASSERT(iov_iter_count(to) <= rlen); @@ -1750,17 +1747,18 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, } rc = 0; - lnet_finalize(ni, lntmsg, 0); + lnet_finalize(lntmsg, 0); break; case IBLND_MSG_PUT_REQ: { + u64 ibprm_cookie = rxmsg->ibm_u.putreq.ibprm_cookie; struct kib_msg *txmsg; struct kib_rdma_desc *rd; if (!iov_iter_count(to)) { - lnet_finalize(ni, lntmsg, 0); - kiblnd_send_completion(rx->rx_conn, IBLND_MSG_PUT_NAK, 0, - rxmsg->ibm_u.putreq.ibprm_cookie); + lnet_finalize(lntmsg, 0); + kiblnd_send_completion(rx->rx_conn, IBLND_MSG_PUT_NAK, + 0, ibprm_cookie); break; } @@ -1788,15 +1786,15 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (rc) { CERROR("Can't setup PUT sink for %s: %d\n", libcfs_nid2str(conn->ibc_peer->ibp_nid), rc); - kiblnd_tx_done(ni, tx); + kiblnd_tx_done(tx); /* tell peer_ni it's over */ - kiblnd_send_completion(rx->rx_conn, IBLND_MSG_PUT_NAK, rc, - rxmsg->ibm_u.putreq.ibprm_cookie); + kiblnd_send_completion(rx->rx_conn, IBLND_MSG_PUT_NAK, + rc, ibprm_cookie); break; } nob = offsetof(struct kib_putack_msg, ibpam_rd.rd_frags[rd->rd_nfrags]); - txmsg->ibm_u.putack.ibpam_src_cookie = rxmsg->ibm_u.putreq.ibprm_cookie; + txmsg->ibm_u.putack.ibpam_src_cookie = ibprm_cookie; txmsg->ibm_u.putack.ibpam_dst_cookie = tx->tx_cookie; kiblnd_init_tx_msg(ni, tx, IBLND_MSG_PUT_ACK, nob); @@ -1817,8 +1815,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, } else { /* GET didn't match anything */ kiblnd_send_completion(rx->rx_conn, IBLND_MSG_GET_DONE, - -ENODATA, - rxmsg->ibm_u.get.ibgm_cookie); + -ENODATA, ibprm_cookie); } break; } @@ -2016,7 +2013,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, spin_unlock(&conn->ibc_lock); - kiblnd_txlist_done(conn->ibc_peer->ibp_ni, &zombies, -ECONNABORTED); + kiblnd_txlist_done(&zombies, -ECONNABORTED); } static void @@ -2098,7 +2095,7 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, CNETERR("Deleting messages for %s: connection failed\n", libcfs_nid2str(peer_ni->ibp_nid)); - kiblnd_txlist_done(peer_ni->ibp_ni, &zombies, -EHOSTUNREACH); + kiblnd_txlist_done(&zombies, -EHOSTUNREACH); } static void @@ -2170,13 +2167,11 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, if (!kiblnd_peer_active(peer_ni) || /* peer_ni has been deleted */ conn->ibc_comms_error) { /* error has happened already */ - struct lnet_ni *ni = peer_ni->ibp_ni; - /* start to shut down connection */ kiblnd_close_conn_locked(conn, -ECONNABORTED); write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); - kiblnd_txlist_done(ni, &txs, -ECONNABORTED); + kiblnd_txlist_done(&txs, -ECONNABORTED); return; } diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c index 534ba84..9b9cc87 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c @@ -1653,8 +1653,7 @@ struct ksock_peer * &conn->ksnc_ipaddr, conn->ksnc_port, iov_iter_count(&conn->ksnc_rx_to), conn->ksnc_rx_nob_left, ktime_get_seconds() - last_rcv); - lnet_finalize(conn->ksnc_peer->ksnp_ni, - conn->ksnc_cookie, -EIO); + lnet_finalize(conn->ksnc_cookie, -EIO); break; case SOCKNAL_RX_LNET_HEADER: if (conn->ksnc_rx_started) diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c index 1bf0170..2e99a17 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c @@ -343,7 +343,7 @@ struct ksock_tx * ksocknal_free_tx(tx); if (lnetmsg) /* KSOCK_MSG_NOOP go without lnetmsg */ - lnet_finalize(ni, lnetmsg, rc); + lnet_finalize(lnetmsg, rc); } void @@ -1226,7 +1226,7 @@ struct ksock_route * le64_to_cpu(lhdr->src_nid) != id->nid); } - lnet_finalize(conn->ksnc_peer->ksnp_ni, conn->ksnc_cookie, rc); + lnet_finalize(conn->ksnc_cookie, rc); if (rc) { ksocknal_new_packet(conn, 0); diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index d39331f..a213387 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -408,7 +408,7 @@ } rc = ni->ni_net->net_lnd->lnd_recv(ni, private, msg, delayed, &to, rlen); if (rc < 0) - lnet_finalize(ni, msg, rc); + lnet_finalize(msg, rc); } static void @@ -462,7 +462,7 @@ rc = ni->ni_net->net_lnd->lnd_send(ni, priv, msg); if (rc < 0) - lnet_finalize(ni, msg, rc); + lnet_finalize(msg, rc); } static int @@ -637,7 +637,7 @@ CNETERR("Dropping message for %s: peer not alive\n", libcfs_id2str(msg->msg_target)); if (do_send) - lnet_finalize(ni, msg, -EHOSTUNREACH); + lnet_finalize(msg, -EHOSTUNREACH); lnet_net_lock(cpt); return -EHOSTUNREACH; @@ -650,7 +650,7 @@ CNETERR("Aborting message for %s: LNetM[DE]Unlink() already called on the MD/ME.\n", libcfs_id2str(msg->msg_target)); if (do_send) - lnet_finalize(ni, msg, -ECANCELED); + lnet_finalize(msg, -ECANCELED); lnet_net_lock(cpt); return -ECANCELED; @@ -915,7 +915,7 @@ lnet_ni_recv(msg->msg_rxni, msg->msg_private, NULL, 0, 0, 0, msg->msg_hdr.payload_length); list_del_init(&msg->msg_list); - lnet_finalize(NULL, msg, -ECANCELED); + lnet_finalize(msg, -ECANCELED); } lnet_net_lock(cpt); @@ -1914,7 +1914,7 @@ libcfs_nid2str(ni->ni_nid), libcfs_id2str(info.mi_id), rc); - lnet_finalize(ni, msg, rc); + lnet_finalize(msg, rc); } return 0; @@ -2402,7 +2402,7 @@ free_drop: LASSERT(!msg->msg_md); - lnet_finalize(ni, msg, rc); + lnet_finalize(msg, rc); drop: lnet_drop_message(ni, cpt, private, payload_length); @@ -2447,7 +2447,7 @@ * but we still should give error code so lnet_msg_decommit() * can skip counters operations and other checks. */ - lnet_finalize(msg->msg_rxni, msg, -ENOENT); + lnet_finalize(msg, -ENOENT); } } @@ -2605,7 +2605,7 @@ if (rc) { CNETERR("Error sending PUT to %s: %d\n", libcfs_id2str(target), rc); - lnet_finalize(NULL, msg, rc); + lnet_finalize(msg, rc); } /* completion will be signalled by an event */ @@ -2804,7 +2804,7 @@ struct lnet_msg * if (rc < 0) { CNETERR("Error sending GET to %s: %d\n", libcfs_id2str(target), rc); - lnet_finalize(NULL, msg, rc); + lnet_finalize(msg, rc); } /* completion will be signalled by an event */ diff --git a/drivers/staging/lustre/lnet/lnet/lib-msg.c b/drivers/staging/lustre/lnet/lnet/lib-msg.c index aa28b6a..00be9ab 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-msg.c +++ b/drivers/staging/lustre/lnet/lnet/lib-msg.c @@ -452,7 +452,7 @@ } void -lnet_finalize(struct lnet_ni *ni, struct lnet_msg *msg, int status) +lnet_finalize(struct lnet_msg *msg, int status) { struct lnet_msg_container *container; int my_slot; diff --git a/drivers/staging/lustre/lnet/lnet/lo.c b/drivers/staging/lustre/lnet/lnet/lo.c index 8167980..c8a1eb62 100644 --- a/drivers/staging/lustre/lnet/lnet/lo.c +++ b/drivers/staging/lustre/lnet/lnet/lo.c @@ -62,10 +62,10 @@ sendmsg->msg_offset, iov_iter_count(to)); - lnet_finalize(ni, lntmsg, 0); + lnet_finalize(lntmsg, 0); } - lnet_finalize(ni, sendmsg, 0); + lnet_finalize(sendmsg, 0); return 0; } diff --git a/drivers/staging/lustre/lnet/lnet/net_fault.c b/drivers/staging/lustre/lnet/lnet/net_fault.c index 17891f6..3841bac 100644 --- a/drivers/staging/lustre/lnet/lnet/net_fault.c +++ b/drivers/staging/lustre/lnet/lnet/net_fault.c @@ -633,7 +633,7 @@ struct delay_daemon_data { } lnet_drop_message(ni, cpt, msg->msg_private, msg->msg_len); - lnet_finalize(ni, msg, rc); + lnet_finalize(msg, rc); } } From patchwork Wed Sep 26 02:47:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615107 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC08C13A4 for ; Wed, 26 Sep 2018 02:48:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA4432A69D for ; Wed, 26 Sep 2018 02:48:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CEA962A6B8; Wed, 26 Sep 2018 02:48:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 66C1E2A69D for ; Wed, 26 Sep 2018 02:48:26 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9B61821F4FC; Tue, 25 Sep 2018 19:48:24 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 555E821F499 for ; Tue, 25 Sep 2018 19:48:21 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 32B5A100536B; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2861D2DE; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:47:54 -0400 Message-Id: <1537930097-11624-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/25] lustre: lnet: Allow min stats to be reset in peers and nis X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Doug Oucharek Allow writes to the peers and nis LNet procfs files to reset the mininum stat columns. Signed-off-by: Doug Oucharek WC-bug-id: https://jira.whamcloud.com/browse/LU-7214 Reviewed-on: https://review.whamcloud.com/20470 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/lnet/router_proc.c | 69 +++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 7 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c index a887ca4..4ddd35b 100644 --- a/drivers/staging/lustre/lnet/lnet/router_proc.c +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c @@ -393,7 +393,7 @@ static int proc_lnet_peers(struct ctl_table *table, int write, { const int tmpsiz = 256; struct lnet_peer_table *ptable; - char *tmpstr; + char *tmpstr = NULL; char *s; int cpt = LNET_PROC_CPT_GET(*ppos); int ver = LNET_PROC_VER_GET(*ppos); @@ -402,12 +402,33 @@ static int proc_lnet_peers(struct ctl_table *table, int write, int rc = 0; int len; - BUILD_BUG_ON(LNET_PROC_HASH_BITS < LNET_PEER_HASH_BITS); - LASSERT(!write); + if (write) { + struct lnet_peer_ni *peer; + int i; + + cfs_percpt_for_each(ptable, i, the_lnet.ln_peer_tables) { + lnet_net_lock(i); + for (hash = 0; hash < LNET_PEER_HASH_SIZE; hash++) { + list_for_each_entry(peer, + &ptable->pt_hash[hash], + lpni_hashlist) { + peer->lpni_mintxcredits = + peer->lpni_txcredits; + peer->lpni_minrtrcredits = + peer->lpni_rtrcredits; + } + } + lnet_net_unlock(i); + } + *ppos += *lenp; + return 0; + } if (!*lenp) return 0; + BUILD_BUG_ON(LNET_PROC_HASH_BITS < LNET_PEER_HASH_BITS); + if (cpt >= LNET_CPT_NUMBER) { *lenp = 0; return 0; @@ -627,11 +648,45 @@ static int proc_lnet_nis(struct ctl_table *table, int write, char *s; int len; - LASSERT(!write); - if (!*lenp) return 0; + if (write) { + /* Just reset the min stat. */ + struct lnet_net *net; + struct lnet_ni *ni; + + lnet_net_lock(0); + + list_for_each_entry(net, &the_lnet.ln_nets, net_list) { + list_for_each_entry(ni, &net->net_ni_list, ni_netlist) { + struct lnet_tx_queue *tq; + int i; + int j; + + cfs_percpt_for_each(tq, i, ni->ni_tx_queues) { + for (j = 0; ni->ni_cpts && + j < ni->ni_ncpts; j++) { + if (i == ni->ni_cpts[j]) + break; + } + + if (j == ni->ni_ncpts) + continue; + + if (i != 0) + lnet_net_lock(i); + tq->tq_credits_min = tq->tq_credits; + if (i != 0) + lnet_net_unlock(i); + } + } + } + lnet_net_unlock(0); + *ppos += *lenp; + return 0; + } + tmpstr = kvmalloc(tmpsiz, GFP_KERNEL); if (!tmpstr) return -ENOMEM; @@ -847,7 +902,7 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write, }, { .procname = "peers", - .mode = 0444, + .mode = 0644, .proc_handler = &proc_lnet_peers, }, { @@ -857,7 +912,7 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write, }, { .procname = "nis", - .mode = 0444, + .mode = 0644, .proc_handler = &proc_lnet_nis, }, { From patchwork Wed Sep 26 02:47:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615117 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0B2C161F for ; Wed, 26 Sep 2018 02:48:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B1A822A69D for ; Wed, 26 Sep 2018 02:48:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A64652A879; Wed, 26 Sep 2018 02:48:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3D4622A69D for ; Wed, 26 Sep 2018 02:48:43 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6551C21F5D3; Tue, 25 Sep 2018 19:48:35 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 05B8C21F499 for ; Tue, 25 Sep 2018 19:48:22 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 346AF100536D; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2B26539B; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:47:55 -0400 Message-Id: <1537930097-11624-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/25] lustre: lnet: remove debug ioctl X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber Remove a debug ioctl that was added to allow for debug messages from user space. However, the code is currently not being used. Signed-off-by: Amir Shehata Signed-off-by: Olaf Weber WC-bug-id: https://jira.whamcloud.com/browse/LU-9119 Reviewed-on: https://review.whamcloud.com/26688 Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../lustre/include/uapi/linux/lnet/libcfs_ioctl.h | 3 +-- .../lustre/include/uapi/linux/lnet/lnet-dlc.h | 22 -------------------- drivers/staging/lustre/lnet/lnet/api-ni.c | 24 ---------------------- 3 files changed, 1 insertion(+), 48 deletions(-) diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h index a231f6d..2a9beed 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/libcfs_ioctl.h @@ -144,7 +144,6 @@ struct libcfs_debug_ioctl_data { #define IOC_LIBCFS_GET_LOCAL_NI _IOWR(IOC_LIBCFS_TYPE, 97, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_SET_NUMA_RANGE _IOWR(IOC_LIBCFS_TYPE, 98, IOCTL_CONFIG_SIZE) #define IOC_LIBCFS_GET_NUMA_RANGE _IOWR(IOC_LIBCFS_TYPE, 99, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_DBG _IOWR(IOC_LIBCFS_TYPE, 100, IOCTL_CONFIG_SIZE) -#define IOC_LIBCFS_MAX_NR 100 +#define IOC_LIBCFS_MAX_NR 99 #endif /* __LIBCFS_IOCTL_H__ */ diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h index d1e2911..2594642 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h @@ -94,10 +94,6 @@ struct lnet_ioctl_net_config { /* # different router buffer pools */ #define LNET_NRBPOOLS (LNET_LARGE_BUF_IDX + 1) -enum lnet_dbg_task { - LNET_DBG_INCR_DLC_SEQ = 0 -}; - struct lnet_ioctl_pool_cfg { struct { __u32 pl_npages; @@ -196,24 +192,6 @@ struct lnet_ioctl_peer { } pr_lnd_u; }; -struct lnet_dbg_task_info { - /* - * TODO: a union can be added if the task requires more - * information from user space to be carried out in kernel space. - */ -}; - -/* - * This structure is intended to allow execution of debugging tasks. This - * is not intended to be backwards compatible. Extra tasks can be added in - * the future - */ -struct lnet_ioctl_dbg { - struct libcfs_ioctl_hdr dbg_hdr; - enum lnet_dbg_task dbg_task; - char dbg_bulk[0]; -}; - struct lnet_ioctl_peer_cfg { struct libcfs_ioctl_hdr prcfg_hdr; lnet_nid_t prcfg_prim_nid; diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 9a09927..43e8db1 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1866,17 +1866,6 @@ void lnet_lib_exit(void) } EXPORT_SYMBOL(LNetNIFini); -static int lnet_handle_dbg_task(struct lnet_ioctl_dbg *dbg, - struct lnet_dbg_task_info *dbg_info) -{ - switch (dbg->dbg_task) { - case LNET_DBG_INCR_DLC_SEQ: - lnet_incr_dlc_seq(); - } - - return 0; -} - /** * Grabs the ni data from the ni structure and fills the out * parameters @@ -2845,19 +2834,6 @@ u32 lnet_get_dlc_seq_locked(void) return 0; } - case IOC_LIBCFS_DBG: { - struct lnet_ioctl_dbg *dbg = arg; - struct lnet_dbg_task_info *dbg_info; - size_t total = sizeof(*dbg) + sizeof(*dbg_info); - - if (dbg->dbg_hdr.ioc_len < total) - return -EINVAL; - - dbg_info = (struct lnet_dbg_task_info *)dbg->dbg_bulk; - - return lnet_handle_dbg_task(dbg, dbg_info); - } - default: ni = lnet_net2ni_addref(data->ioc_net); if (!ni) From patchwork Wed Sep 26 02:47:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615123 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E66D2161F for ; Wed, 26 Sep 2018 02:48:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D64C62A866 for ; Wed, 26 Sep 2018 02:48:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CAE592A879; Wed, 26 Sep 2018 02:48:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5C3942A866 for ; Wed, 26 Sep 2018 02:48:51 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8A13321F9F8; Tue, 25 Sep 2018 19:48:40 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A742921F499 for ; Tue, 25 Sep 2018 19:48:21 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 33BD2100536C; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2E65939E; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:47:56 -0400 Message-Id: <1537930097-11624-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/25] lustre: lnet: Normalize ioctl interface X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata To avoid backwards compatibility issues between base MR and Dynamic Discovery standardize the ioctl interface by bringing in changes to the interface required by Dynamic Discovery now. Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-9119 Reviewed-on: https://review.whamcloud.com/26689 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 5 ++- .../lustre/include/uapi/linux/lnet/lnet-dlc.h | 39 ++++++++++++++++--- drivers/staging/lustre/lnet/lnet/api-ni.c | 21 +++++----- drivers/staging/lustre/lnet/lnet/peer.c | 45 ++++++++++++++-------- 4 files changed, 77 insertions(+), 33 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 53cbf6d..f510b9e 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -665,8 +665,9 @@ bool lnet_peer_is_ni_pref_locked(struct lnet_peer_ni *lpni, int lnet_add_peer_ni_to_peer(lnet_nid_t key_nid, lnet_nid_t nid, bool mr); int lnet_del_peer_ni_from_peer(lnet_nid_t key_nid, lnet_nid_t nid); int lnet_get_peer_info(__u32 idx, lnet_nid_t *primary_nid, lnet_nid_t *nid, - bool *mr, struct lnet_peer_ni_credit_info *peer_ni_info, - struct lnet_ioctl_element_stats *peer_ni_stats); + bool *mr, + struct lnet_peer_ni_credit_info __user *peer_ni_info, + struct lnet_ioctl_element_stats __user *peer_ni_stats); int lnet_get_peer_ni_info(__u32 peer_index, __u64 *nid, char alivness[LNET_MAX_STR_LEN], __u32 *cpt_iter, __u32 *refcount, diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h index 2594642..e603455 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h @@ -104,6 +104,17 @@ struct lnet_ioctl_pool_cfg { __u32 pl_routing; }; +struct lnet_ioctl_ping_data { + struct libcfs_ioctl_hdr ping_hdr; + + __u32 op_param; + __u32 ping_count; + __u32 ping_flags; + bool mr_info; + struct lnet_process_id ping_id; + struct lnet_process_id __user *ping_buf; +}; + struct lnet_ioctl_config_data { struct libcfs_ioctl_hdr cfg_hdr; @@ -138,10 +149,26 @@ struct lnet_ioctl_config_data { char cfg_bulk[0]; }; +struct lnet_ioctl_comm_count { + __u32 ico_get_count; + __u32 ico_put_count; + __u32 ico_reply_count; + __u32 ico_ack_count; + __u32 ico_hello_count; +}; + struct lnet_ioctl_element_stats { - u32 send_count; - u32 recv_count; - u32 drop_count; + __u32 iel_send_count; + __u32 iel_recv_count; + __u32 iel_drop_count; +}; + +struct lnet_ioctl_element_msg_stats { + struct libcfs_ioctl_hdr im_hdr; + __u32 im_idx; + struct lnet_ioctl_comm_count im_send_stats; + struct lnet_ioctl_comm_count im_recv_stats; + struct lnet_ioctl_comm_count im_drop_stats; }; /* @@ -196,9 +223,11 @@ struct lnet_ioctl_peer_cfg { struct libcfs_ioctl_hdr prcfg_hdr; lnet_nid_t prcfg_prim_nid; lnet_nid_t prcfg_cfg_nid; - __u32 prcfg_idx; + __u32 prcfg_count; bool prcfg_mr; - char prcfg_bulk[0]; + __u32 prcfg_state; + __u32 prcfg_size; + void __user *prcfg_bulk; }; struct lnet_ioctl_numa_range { diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 43e8db1..2d430d0 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1904,8 +1904,8 @@ void lnet_lib_exit(void) memcpy(&tun->lt_cmn, &ni->ni_net->net_tunables, sizeof(tun->lt_cmn)); if (stats) { - stats->send_count = atomic_read(&ni->ni_stats.send_count); - stats->recv_count = atomic_read(&ni->ni_stats.recv_count); + stats->iel_send_count = atomic_read(&ni->ni_stats.send_count); + stats->iel_recv_count = atomic_read(&ni->ni_stats.recv_count); } /* @@ -2761,20 +2761,19 @@ u32 lnet_get_dlc_seq_locked(void) case IOC_LIBCFS_GET_PEER_NI: { struct lnet_ioctl_peer_cfg *cfg = arg; - struct lnet_peer_ni_credit_info *lpni_cri; - struct lnet_ioctl_element_stats *lpni_stats; - size_t total = sizeof(*cfg) + sizeof(*lpni_cri) + - sizeof(*lpni_stats); + struct lnet_peer_ni_credit_info __user *lpni_cri; + struct lnet_ioctl_element_stats __user *lpni_stats; + size_t usr_size = sizeof(*lpni_cri) + sizeof(*lpni_stats); - if (cfg->prcfg_hdr.ioc_len < total) + if ((cfg->prcfg_hdr.ioc_len != sizeof(*cfg)) || + (cfg->prcfg_size != usr_size)) return -EINVAL; - lpni_cri = (struct lnet_peer_ni_credit_info *)cfg->prcfg_bulk; - lpni_stats = (struct lnet_ioctl_element_stats *) - (cfg->prcfg_bulk + sizeof(*lpni_cri)); + lpni_cri = cfg->prcfg_bulk; + lpni_stats = cfg->prcfg_bulk + sizeof(*lpni_cri); mutex_lock(&the_lnet.ln_api_mutex); - rc = lnet_get_peer_info(cfg->prcfg_idx, &cfg->prcfg_prim_nid, + rc = lnet_get_peer_info(cfg->prcfg_count, &cfg->prcfg_prim_nid, &cfg->prcfg_cfg_nid, &cfg->prcfg_mr, lpni_cri, lpni_stats); mutex_unlock(&the_lnet.ln_api_mutex); diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 3a5f9db..ae3ffca 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -1179,12 +1179,16 @@ struct lnet_peer_ni * } int lnet_get_peer_info(__u32 idx, lnet_nid_t *primary_nid, lnet_nid_t *nid, - bool *mr, struct lnet_peer_ni_credit_info *peer_ni_info, - struct lnet_ioctl_element_stats *peer_ni_stats) + bool *mr, + struct lnet_peer_ni_credit_info __user *peer_ni_info, + struct lnet_ioctl_element_stats __user *peer_ni_stats) { + struct lnet_ioctl_element_stats ni_stats; + struct lnet_peer_ni_credit_info ni_info; struct lnet_peer_ni *lpni = NULL; struct lnet_peer_net *lpn = NULL; struct lnet_peer *lp = NULL; + int rc; lpni = lnet_get_peer_ni_idx_locked(idx, &lpn, &lp); @@ -1194,24 +1198,35 @@ int lnet_get_peer_info(__u32 idx, lnet_nid_t *primary_nid, lnet_nid_t *nid, *primary_nid = lp->lp_primary_nid; *mr = lp->lp_multi_rail; *nid = lpni->lpni_nid; - snprintf(peer_ni_info->cr_aliveness, LNET_MAX_STR_LEN, "NA"); + snprintf(ni_info.cr_aliveness, LNET_MAX_STR_LEN, "NA"); if (lnet_isrouter(lpni) || lnet_peer_aliveness_enabled(lpni)) - snprintf(peer_ni_info->cr_aliveness, LNET_MAX_STR_LEN, + snprintf(ni_info.cr_aliveness, LNET_MAX_STR_LEN, lpni->lpni_alive ? "up" : "down"); - peer_ni_info->cr_refcount = atomic_read(&lpni->lpni_refcount); - peer_ni_info->cr_ni_peer_tx_credits = lpni->lpni_net ? + ni_info.cr_refcount = atomic_read(&lpni->lpni_refcount); + ni_info.cr_ni_peer_tx_credits = lpni->lpni_net ? lpni->lpni_net->net_tunables.lct_peer_tx_credits : 0; - peer_ni_info->cr_peer_tx_credits = lpni->lpni_txcredits; - peer_ni_info->cr_peer_rtr_credits = lpni->lpni_rtrcredits; - peer_ni_info->cr_peer_min_rtr_credits = lpni->lpni_minrtrcredits; - peer_ni_info->cr_peer_min_tx_credits = lpni->lpni_mintxcredits; - peer_ni_info->cr_peer_tx_qnob = lpni->lpni_txqnob; + ni_info.cr_peer_tx_credits = lpni->lpni_txcredits; + ni_info.cr_peer_rtr_credits = lpni->lpni_rtrcredits; + ni_info.cr_peer_min_rtr_credits = lpni->lpni_minrtrcredits; + ni_info.cr_peer_min_tx_credits = lpni->lpni_mintxcredits; + ni_info.cr_peer_tx_qnob = lpni->lpni_txqnob; - peer_ni_stats->send_count = atomic_read(&lpni->lpni_stats.send_count); - peer_ni_stats->recv_count = atomic_read(&lpni->lpni_stats.recv_count); - peer_ni_stats->drop_count = atomic_read(&lpni->lpni_stats.drop_count); + ni_stats.iel_send_count = atomic_read(&lpni->lpni_stats.send_count); + ni_stats.iel_recv_count = atomic_read(&lpni->lpni_stats.recv_count); + ni_stats.iel_drop_count = atomic_read(&lpni->lpni_stats.drop_count); - return 0; + /* If copy_to_user fails */ + rc = -EFAULT; + if (copy_to_user(peer_ni_info, &ni_info, sizeof(ni_info))) + goto copy_failed; + + if (copy_to_user(peer_ni_stats, &ni_stats, sizeof(ni_stats))) + goto copy_failed; + + rc = 0; + +copy_failed: + return rc; } From patchwork Wed Sep 26 02:47:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615111 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E60B161F for ; Wed, 26 Sep 2018 02:48:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D1982A69D for ; Wed, 26 Sep 2018 02:48:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 712C62A879; Wed, 26 Sep 2018 02:48:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id F13992A69D for ; Wed, 26 Sep 2018 02:48:31 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 964C021F50A; Tue, 25 Sep 2018 19:48:28 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 573B821F4A9 for ; Tue, 25 Sep 2018 19:48:22 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 36CAB100536E; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3129646B; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:47:57 -0400 Message-Id: <1537930097-11624-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/25] lustre: lnet: fix race in lnet shutdown path X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber The locking changes for the lnet_net_lock made for Multi-Rail introduce a race in the LNet shutdown path. The code keeps two states in the_lnet.ln_shutdown: 0 means LNet is either up and running or shut down, while 1 means lnet is shutting down. In lnet_select_pathway() if we need to restart and drop and relock the lnet_net_lock we can find that LNet went from running to stopped, and not be able to tell the difference. Replace ln_shutdown with a three-state ln_state patterned on ln_rc_state: states are LNET_STATE_SHUTDOWN, LNET_STATE_RUNNING, and LNET_STATE_STOPPING. Most checks against ln_shutdown now test ln_state against LNET_STATE_RUNNING. LNet moves to RUNNING state in lnet_startup_lndnets(). Signed-off-by: Olaf Weber WC-bug-id: https://jira.whamcloud.com/browse/LU-9119 Reviewed-on: https://review.whamcloud.com/26690 Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons Signed-off-by: NeilBrown --- drivers/staging/lustre/include/linux/lnet/lib-types.h | 9 +++++++-- drivers/staging/lustre/lnet/lnet/api-ni.c | 15 ++++++++++++--- drivers/staging/lustre/lnet/lnet/lib-move.c | 2 +- drivers/staging/lustre/lnet/lnet/lib-ptl.c | 4 ++-- drivers/staging/lustre/lnet/lnet/peer.c | 10 +++++----- drivers/staging/lustre/lnet/lnet/router.c | 6 +++--- 6 files changed, 30 insertions(+), 16 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 18e2665..6abac19 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -726,6 +726,11 @@ struct lnet_msg_container { #define LNET_RC_STATE_RUNNING 1 /* started up OK */ #define LNET_RC_STATE_STOPPING 2 /* telling thread to stop */ +/* LNet states */ +#define LNET_STATE_SHUTDOWN 0 /* not started */ +#define LNET_STATE_RUNNING 1 /* started up OK */ +#define LNET_STATE_STOPPING 2 /* telling thread to stop */ + struct lnet { /* CPU partition table of LNet */ struct cfs_cpt_table *ln_cpt_table; @@ -805,8 +810,8 @@ struct lnet { int ln_niinit_self; /* LNetNIInit/LNetNIFini counter */ int ln_refcount; - /* shutdown in progress */ - int ln_shutdown; + /* SHUTDOWN/RUNNING/STOPPING */ + int ln_state; int ln_routing; /* am I a router? */ lnet_pid_t ln_pid; /* requested pid */ diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 2d430d0..7c907a3 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -1277,11 +1277,11 @@ struct lnet_ni * /* NB called holding the global mutex */ /* All quiet on the API front */ - LASSERT(!the_lnet.ln_shutdown); + LASSERT(the_lnet.ln_state == LNET_STATE_RUNNING); LASSERT(!the_lnet.ln_refcount); lnet_net_lock(LNET_LOCK_EX); - the_lnet.ln_shutdown = 1; /* flag shutdown */ + the_lnet.ln_state = LNET_STATE_STOPPING; while (!list_empty(&the_lnet.ln_nets)) { /* @@ -1309,7 +1309,7 @@ struct lnet_ni * } lnet_net_lock(LNET_LOCK_EX); - the_lnet.ln_shutdown = 0; + the_lnet.ln_state = LNET_STATE_SHUTDOWN; lnet_net_unlock(LNET_LOCK_EX); } @@ -1583,6 +1583,15 @@ struct lnet_ni * int rc; int ni_count = 0; + /* + * Change to running state before bringing up the LNDs. This + * allows lnet_shutdown_lndnets() to assert that we've passed + * through here. + */ + lnet_net_lock(LNET_LOCK_EX); + the_lnet.ln_state = LNET_STATE_RUNNING; + lnet_net_unlock(LNET_LOCK_EX); + while (!list_empty(netlist)) { net = list_entry(netlist->next, struct lnet_net, net_list); list_del_init(&net->net_list); diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index a213387..ea7e2c3 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1242,7 +1242,7 @@ seq = lnet_get_dlc_seq_locked(); - if (the_lnet.ln_shutdown) { + if (the_lnet.ln_state != LNET_STATE_RUNNING) { lnet_net_unlock(cpt); return -ESHUTDOWN; } diff --git a/drivers/staging/lustre/lnet/lnet/lib-ptl.c b/drivers/staging/lustre/lnet/lnet/lib-ptl.c index d403353..6fa5bbf 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-ptl.c +++ b/drivers/staging/lustre/lnet/lnet/lib-ptl.c @@ -589,7 +589,7 @@ struct list_head * mtable = lnet_mt_of_match(info, msg); lnet_res_lock(mtable->mt_cpt); - if (the_lnet.ln_shutdown) { + if (the_lnet.ln_state != LNET_STATE_RUNNING) { rc = LNET_MATCHMD_DROP; goto out1; } @@ -951,7 +951,7 @@ struct list_head * list_move(&msg->msg_list, &zombies); } } else { - if (the_lnet.ln_shutdown) + if (the_lnet.ln_state != LNET_STATE_RUNNING) CWARN("Active lazy portal %d on exit\n", portal); else CDEBUG(D_NET, "clearing portal %d lazy\n", portal); diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index ae3ffca..2fbf93a 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -428,7 +428,7 @@ void lnet_peer_uninit(void) struct lnet_peer_table *ptable; int i; - LASSERT(the_lnet.ln_shutdown || net); + LASSERT(the_lnet.ln_state != LNET_STATE_SHUTDOWN || net); /* * If just deleting the peers for a NI, get rid of any routes these * peers are gateways for. @@ -458,7 +458,7 @@ void lnet_peer_uninit(void) struct list_head *peers; struct lnet_peer_ni *lp; - LASSERT(!the_lnet.ln_shutdown); + LASSERT(the_lnet.ln_state == LNET_STATE_RUNNING); peers = &ptable->pt_hash[lnet_nid2peerhash(nid)]; list_for_each_entry(lp, peers, lpni_hashlist) { @@ -1000,7 +1000,7 @@ struct lnet_peer_ni * struct lnet_peer_ni *lpni = NULL; int rc; - if (the_lnet.ln_shutdown) /* it's shutting down */ + if (the_lnet.ln_state != LNET_STATE_RUNNING) return ERR_PTR(-ESHUTDOWN); /* @@ -1034,7 +1034,7 @@ struct lnet_peer_ni * struct lnet_peer_ni *lpni = NULL; int rc; - if (the_lnet.ln_shutdown) /* it's shutting down */ + if (the_lnet.ln_state != LNET_STATE_RUNNING) return ERR_PTR(-ESHUTDOWN); /* @@ -1063,7 +1063,7 @@ struct lnet_peer_ni * * Shutdown is only set under the ln_api_lock, so a single * check here is sufficent. */ - if (the_lnet.ln_shutdown) { + if (the_lnet.ln_state != LNET_STATE_RUNNING) { lpni = ERR_PTR(-ESHUTDOWN); goto out_mutex_unlock; } diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c index a0483f9..d3aef8b 100644 --- a/drivers/staging/lustre/lnet/lnet/router.c +++ b/drivers/staging/lustre/lnet/lnet/router.c @@ -252,7 +252,7 @@ struct lnet_remotenet * struct lnet_remotenet *rnet; struct list_head *rn_list; - LASSERT(!the_lnet.ln_shutdown); + LASSERT(the_lnet.ln_state == LNET_STATE_RUNNING); rn_list = lnet_net2rnethash(net); list_for_each_entry(rnet, rn_list, lrn_list) { @@ -374,7 +374,7 @@ static void lnet_shuffle_seed(void) return rc; } route->lr_gateway = lpni; - LASSERT(!the_lnet.ln_shutdown); + LASSERT(the_lnet.ln_state == LNET_STATE_RUNNING); rnet2 = lnet_find_rnet_locked(net); if (!rnet2) { @@ -1775,7 +1775,7 @@ int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg) lnet_net_lock(cpt); - if (the_lnet.ln_shutdown) { + if (the_lnet.ln_state != LNET_STATE_RUNNING) { lnet_net_unlock(cpt); return -ESHUTDOWN; } From patchwork Wed Sep 26 02:47:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615113 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 37E8213A4 for ; Wed, 26 Sep 2018 02:48:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 271A12A69D for ; Wed, 26 Sep 2018 02:48:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B75B2A879; Wed, 26 Sep 2018 02:48:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A9CD22A69D for ; Wed, 26 Sep 2018 02:48:36 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3550B21F659; Tue, 25 Sep 2018 19:48:31 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id AD04621F4A9 for ; Tue, 25 Sep 2018 19:48:22 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 3856E100536F; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3587D604; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:47:58 -0400 Message-Id: <1537930097-11624-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/25] lustre: lnet: loopback NID in lnet_select_pathway() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber In lnet_select_pathway() sending to the loopback NID is handled as a special case, because there are no credits involved. (The loopback NID doesn't use credits, and therefore does not have any credits. If a message goes through the credit-managing code it therefore ends up waiting indefinitely for credits to become available.) The check whether we're sending over the loopback NID must be done after we've completed choosing the NI to send over. In its present location it only handles the case where the loopback NID was explicitly passed in as the source NID. (Lustre does not exercise this code path during normal operation, the bug was encountered while testing code for the peer discovery feature.) Signed-off-by: Olaf Weber WC-bug-id: https://jira.whamcloud.com/browse/LU-9119 Reviewed-on: https://review.whamcloud.com/26692 Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/lnet/lib-move.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index ea7e2c3..d533b8e 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1374,20 +1374,6 @@ } } - if (best_ni == the_lnet.ln_loni) { - /* No send credit hassles with LOLND */ - lnet_ni_addref_locked(best_ni, cpt); - msg->msg_hdr.dest_nid = cpu_to_le64(best_ni->ni_nid); - if (!msg->msg_routing) - msg->msg_hdr.src_nid = cpu_to_le64(best_ni->ni_nid); - msg->msg_target.nid = best_ni->ni_nid; - lnet_msg_commit(msg, cpt); - msg->msg_txni = best_ni; - lnet_net_unlock(cpt); - - return LNET_CREDIT_OK; - } - /* * if we already found a best_ni because src_nid is specified and * best_lpni because we are replying to a message then just send @@ -1630,6 +1616,21 @@ } send: + /* Shortcut for loopback. */ + if (best_ni == the_lnet.ln_loni) { + /* No send credit hassles with LOLND */ + lnet_ni_addref_locked(best_ni, cpt); + msg->msg_hdr.dest_nid = cpu_to_le64(best_ni->ni_nid); + if (!msg->msg_routing) + msg->msg_hdr.src_nid = cpu_to_le64(best_ni->ni_nid); + msg->msg_target.nid = best_ni->ni_nid; + lnet_msg_commit(msg, cpt); + msg->msg_txni = best_ni; + lnet_net_unlock(cpt); + + return LNET_CREDIT_OK; + } + routing = routing || routing2; /* From patchwork Wed Sep 26 02:47:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615115 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C790B161F for ; Wed, 26 Sep 2018 02:48:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B62AE2A69D for ; Wed, 26 Sep 2018 02:48:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AA6C62A86E; Wed, 26 Sep 2018 02:48:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 388712A866 for ; Wed, 26 Sep 2018 02:48:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7D53821F6BD; Tue, 25 Sep 2018 19:48:32 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EB99B21F4B9 for ; Tue, 25 Sep 2018 19:48:22 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 3A9091005370; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 38CF2832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:47:59 -0400 Message-Id: <1537930097-11624-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/25] lustre: lnet: rename LNET_MAX_INTERFACES X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber LNET_MAX_INTERFACES is the number of interfaces supported by interface bonding in the ksocknal LND. It shows up in LNet because a number of data structures are shared between LNDs. Rename it to LNET_NUM_INTERFACES to reduce the confusion of what it does. Also it allows us to build the userland tools. Signed-off-by: Olaf Weber WC-bug-id: https://jira.whamcloud.com/browse/LU-9119 Reviewed-on: https://review.whamcloud.com/26693 Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-types.h | 2 +- .../lustre/include/uapi/linux/lnet/lnet-dlc.h | 4 ++-- .../lustre/include/uapi/linux/lnet/lnet-types.h | 2 +- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 2 +- .../staging/lustre/lnet/klnds/socklnd/socklnd.c | 22 +++++++++++----------- .../staging/lustre/lnet/klnds/socklnd/socklnd.h | 4 ++-- .../staging/lustre/lnet/klnds/socklnd/socklnd_cb.c | 2 +- .../lustre/lnet/klnds/socklnd/socklnd_proto.c | 4 ++-- drivers/staging/lustre/lnet/lnet/config.c | 10 +++++----- 9 files changed, 26 insertions(+), 26 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h b/drivers/staging/lustre/include/linux/lnet/lib-types.h index 6abac19..90f7577 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-types.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-types.h @@ -371,7 +371,7 @@ struct lnet_ni { * equivalent interfaces to use * This is an array because socklnd bonding can still be configured */ - char *ni_interfaces[LNET_MAX_INTERFACES]; + char *ni_interfaces[LNET_NUM_INTERFACES]; /* original net namespace */ struct net *ni_net_ns; }; diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h index e603455..07516fd 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h @@ -81,7 +81,7 @@ struct lnet_ioctl_config_lnd_tunables { }; struct lnet_ioctl_net_config { - char ni_interfaces[LNET_MAX_INTERFACES][LNET_MAX_STR_LEN]; + char ni_interfaces[LNET_NUM_INTERFACES][LNET_MAX_STR_LEN]; __u32 ni_status; __u32 ni_cpts[LNET_MAX_SHOW_NUM_CPT]; char cfg_bulk[0]; @@ -184,7 +184,7 @@ struct lnet_ioctl_element_msg_stats { struct lnet_ioctl_config_ni { struct libcfs_ioctl_hdr lic_cfg_hdr; lnet_nid_t lic_nid; - char lic_ni_intf[LNET_MAX_INTERFACES][LNET_MAX_STR_LEN]; + char lic_ni_intf[LNET_NUM_INTERFACES][LNET_MAX_STR_LEN]; char lic_legacy_ip2nets[LNET_MAX_STR_LEN]; __u32 lic_cpts[LNET_MAX_SHOW_NUM_CPT]; __u32 lic_ncpts; diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h index 837e5fe..a373b9c 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-types.h @@ -264,7 +264,7 @@ struct lnet_counters { #define LNET_NI_STATUS_DOWN 0xdeadface #define LNET_NI_STATUS_INVALID 0x00000000 -#define LNET_MAX_INTERFACES 16 +#define LNET_NUM_INTERFACES 16 /** * Objects maintained by the LNet are accessed through handles. Handle types diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index b3a4344..aea83a5 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -2870,7 +2870,7 @@ static int kiblnd_startup(struct lnet_ni *ni) if (ni->ni_interfaces[0]) { /* Use the IPoIB interface specified in 'networks=' */ - BUILD_BUG_ON(LNET_MAX_INTERFACES <= 1); + BUILD_BUG_ON(LNET_NUM_INTERFACES <= 1); if (ni->ni_interfaces[1]) { CERROR("Multiple interfaces not supported\n"); goto failed; diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c index 9b9cc87..ad94837 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c @@ -53,7 +53,7 @@ struct ksock_interface *iface; for (i = 0; i < net->ksnn_ninterfaces; i++) { - LASSERT(i < LNET_MAX_INTERFACES); + LASSERT(i < LNET_NUM_INTERFACES); iface = &net->ksnn_interfaces[i]; if (iface->ksni_ipaddr == ip) @@ -221,7 +221,7 @@ struct ksock_peer * struct ksock_interface *iface; for (i = 0; i < peer_ni->ksnp_n_passive_ips; i++) { - LASSERT(i < LNET_MAX_INTERFACES); + LASSERT(i < LNET_NUM_INTERFACES); ip = peer_ni->ksnp_passive_ips[i]; iface = ksocknal_ip2iface(peer_ni->ksnp_ni, ip); @@ -678,7 +678,7 @@ struct ksock_peer * read_lock(&ksocknal_data.ksnd_global_lock); nip = net->ksnn_ninterfaces; - LASSERT(nip <= LNET_MAX_INTERFACES); + LASSERT(nip <= LNET_NUM_INTERFACES); /* * Only offer interfaces for additional connections if I have @@ -759,8 +759,8 @@ struct ksock_peer * */ write_lock_bh(global_lock); - LASSERT(n_peerips <= LNET_MAX_INTERFACES); - LASSERT(net->ksnn_ninterfaces <= LNET_MAX_INTERFACES); + LASSERT(n_peerips <= LNET_NUM_INTERFACES); + LASSERT(net->ksnn_ninterfaces <= LNET_NUM_INTERFACES); /* * Only match interfaces for additional connections @@ -879,7 +879,7 @@ struct ksock_peer * return; } - LASSERT(npeer_ipaddrs <= LNET_MAX_INTERFACES); + LASSERT(npeer_ipaddrs <= LNET_NUM_INTERFACES); for (i = 0; i < npeer_ipaddrs; i++) { if (newroute) { @@ -908,7 +908,7 @@ struct ksock_peer * best_nroutes = 0; best_netmatch = 0; - LASSERT(net->ksnn_ninterfaces <= LNET_MAX_INTERFACES); + LASSERT(net->ksnn_ninterfaces <= LNET_NUM_INTERFACES); /* Select interface to connect from */ for (j = 0; j < net->ksnn_ninterfaces; j++) { @@ -1048,7 +1048,7 @@ struct ksock_peer * atomic_set(&conn->ksnc_tx_nob, 0); hello = kvzalloc(offsetof(struct ksock_hello_msg, - kshm_ips[LNET_MAX_INTERFACES]), + kshm_ips[LNET_NUM_INTERFACES]), GFP_KERNEL); if (!hello) { rc = -ENOMEM; @@ -1956,7 +1956,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) if (iface) { /* silently ignore dups */ rc = 0; - } else if (net->ksnn_ninterfaces == LNET_MAX_INTERFACES) { + } else if (net->ksnn_ninterfaces == LNET_NUM_INTERFACES) { rc = -ENOSPC; } else { iface = &net->ksnn_interfaces[net->ksnn_ninterfaces++]; @@ -2594,7 +2594,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) continue; } - if (j == LNET_MAX_INTERFACES) { + if (j == LNET_NUM_INTERFACES) { CWARN("Ignoring interface %s (too many interfaces)\n", name); continue; @@ -2782,7 +2782,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) net->ksnn_ninterfaces = rc; } else { - for (i = 0; i < LNET_MAX_INTERFACES; i++) { + for (i = 0; i < LNET_NUM_INTERFACES; i++) { if (!ni->ni_interfaces[i]) break; rc = ksocknal_enumerate_interfaces(net, ni->ni_interfaces[i]); diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h index cc813e4..95ca2aa 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h @@ -173,7 +173,7 @@ struct ksock_net { int ksnn_npeers; /* # peers */ int ksnn_shutdown; /* shutting down? */ int ksnn_ninterfaces; /* IP interfaces */ - struct ksock_interface ksnn_interfaces[LNET_MAX_INTERFACES]; + struct ksock_interface ksnn_interfaces[LNET_NUM_INTERFACES]; }; /** connd timeout */ @@ -441,7 +441,7 @@ struct ksock_peer { int ksnp_n_passive_ips; /* # of... */ /* preferred local interfaces */ - __u32 ksnp_passive_ips[LNET_MAX_INTERFACES]; + u32 ksnp_passive_ips[LNET_NUM_INTERFACES]; }; struct ksock_connreq { diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c index 2e99a17..73321a7 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c @@ -1580,7 +1580,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) /* CAVEAT EMPTOR: this byte flips 'ipaddrs' */ struct ksock_net *net = (struct ksock_net *)ni->ni_data; - LASSERT(hello->kshm_nips <= LNET_MAX_INTERFACES); + LASSERT(hello->kshm_nips <= LNET_NUM_INTERFACES); /* rely on caller to hold a ref on socket so it wouldn't disappear */ LASSERT(conn->ksnc_proto); diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_proto.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_proto.c index 8c10eda..10a2757 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_proto.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_proto.c @@ -614,7 +614,7 @@ hello->kshm_nips = le32_to_cpu(hdr->payload_length) / sizeof(__u32); - if (hello->kshm_nips > LNET_MAX_INTERFACES) { + if (hello->kshm_nips > LNET_NUM_INTERFACES) { CERROR("Bad nips %d from ip %pI4h\n", hello->kshm_nips, &conn->ksnc_ipaddr); rc = -EPROTO; @@ -684,7 +684,7 @@ __swab32s(&hello->kshm_nips); } - if (hello->kshm_nips > LNET_MAX_INTERFACES) { + if (hello->kshm_nips > LNET_NUM_INTERFACES) { CERROR("Bad nips %d from ip %pI4h\n", hello->kshm_nips, &conn->ksnc_ipaddr); return -EPROTO; diff --git a/drivers/staging/lustre/lnet/lnet/config.c b/drivers/staging/lustre/lnet/lnet/config.c index 4c22416..3ea56c8 100644 --- a/drivers/staging/lustre/lnet/lnet/config.c +++ b/drivers/staging/lustre/lnet/lnet/config.c @@ -123,10 +123,10 @@ struct lnet_text_buf { /* tmp struct for parsing routes */ /* check that the NI is unique to the interfaces with in the same NI. * This is only a consideration if use_tcp_bonding is set */ static bool -lnet_ni_unique_ni(char *iface_list[LNET_MAX_INTERFACES], char *iface) +lnet_ni_unique_ni(char *iface_list[LNET_NUM_INTERFACES], char *iface) { int i; - for (i = 0; i < LNET_MAX_INTERFACES; i++) { + for (i = 0; i < LNET_NUM_INTERFACES; i++) { if (iface_list[i] && strncmp(iface_list[i], iface, strlen(iface)) == 0) return false; @@ -304,7 +304,7 @@ struct lnet_text_buf { /* tmp struct for parsing routes */ kfree(ni->ni_cpts); - for (i = 0; i < LNET_MAX_INTERFACES && ni->ni_interfaces[i]; i++) + for (i = 0; i < LNET_NUM_INTERFACES && ni->ni_interfaces[i]; i++) kfree(ni->ni_interfaces[i]); /* release reference to net namespace */ @@ -397,11 +397,11 @@ struct lnet_net * * can free the tokens at the end of the function. * The newly allocated ni_interfaces[] can be * freed when freeing the NI */ - while (niface < LNET_MAX_INTERFACES && + while (niface < LNET_NUM_INTERFACES && ni->ni_interfaces[niface]) niface++; - if (niface >= LNET_MAX_INTERFACES) { + if (niface >= LNET_NUM_INTERFACES) { LCONSOLE_ERROR_MSG(0x115, "Too many interfaces " "for net %s\n", libcfs_net2str(LNET_NIDNET(ni->ni_nid))); From patchwork Wed Sep 26 02:48:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615119 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D82313A4 for ; Wed, 26 Sep 2018 02:48:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E65D2A866 for ; Wed, 26 Sep 2018 02:48:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 534612A879; Wed, 26 Sep 2018 02:48:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id E4C552A866 for ; Wed, 26 Sep 2018 02:48:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 93A4B21F898; Tue, 25 Sep 2018 19:48:36 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 48F2521F4B9 for ; Tue, 25 Sep 2018 19:48:23 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 3D55F1005371; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3BDA9833; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:00 -0400 Message-Id: <1537930097-11624-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/25] lustre: lnet: selftest MR fix X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata selftest always responded to the primary nid of the peer rather than the source of the message, which it should be. Signed-off-by: Amir Shehata Signed-off-by: Olaf Weber WC-bug-id: https://jira.whamcloud.com/browse/LU-9119 Reviewed-on: https://review.whamcloud.com/26723 Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/selftest/rpc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lnet/selftest/rpc.c b/drivers/staging/lustre/lnet/selftest/rpc.c index 298de41..26132ab 100644 --- a/drivers/staging/lustre/lnet/selftest/rpc.c +++ b/drivers/staging/lustre/lnet/selftest/rpc.c @@ -1482,7 +1482,7 @@ struct srpc_client_rpc * sv->sv_shuttingdown); buffer = container_of(ev->md.start, struct srpc_buffer, buf_msg); - buffer->buf_peer = ev->initiator; + buffer->buf_peer = ev->source; buffer->buf_self = ev->target.nid; LASSERT(scd->scd_buf_nposted > 0); From patchwork Wed Sep 26 02:48:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615143 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C49E174A for ; Wed, 26 Sep 2018 02:49:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 586372A69D for ; Wed, 26 Sep 2018 02:49:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4C9252A86E; Wed, 26 Sep 2018 02:49:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D644D2A866 for ; Wed, 26 Sep 2018 02:49:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EE65521FDAA; Tue, 25 Sep 2018 19:48:59 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 87F7821F4CD for ; Tue, 25 Sep 2018 19:48:23 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 4049B1005372; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3EF99832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:01 -0400 Message-Id: <1537930097-11624-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/25] lustre: lnet: prevent assert on ln_state X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata lnet_peer_primary_nid() is called from lnet_parse. It checks ln_state outside the net lock, causing a race condition during shutdown where the code expects the state to be running, but it's stopping or shutdown. Fixed the issue by renaming lnet_peer_primary_nid() to lnet_peer_primary_nid_locked(). This function is now called when lnet_net_lock is held in lnet_parse(). In lnet_create_reply_msg() we already have access to the msg_txpeer, so we lookup the primary_nid directly Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-9549 Reviewed-on: https://review.whamcloud.com/27262 Reviewed-by: Doug Oucharek Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/include/linux/lnet/lib-lnet.h | 2 +- drivers/staging/lustre/lnet/lnet/lib-move.c | 7 +++---- drivers/staging/lustre/lnet/lnet/peer.c | 5 +---- 3 files changed, 5 insertions(+), 9 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index f510b9e..6bfdc9b 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -653,7 +653,7 @@ struct lnet_peer_ni *lnet_get_next_peer_ni_locked(struct lnet_peer *peer, struct lnet_peer_ni *lnet_nid2peerni_ex(lnet_nid_t nid, int cpt); struct lnet_peer_ni *lnet_find_peer_ni_locked(lnet_nid_t nid); void lnet_peer_net_added(struct lnet_net *net); -lnet_nid_t lnet_peer_primary_nid(lnet_nid_t nid); +lnet_nid_t lnet_peer_primary_nid_locked(lnet_nid_t nid); void lnet_peer_tables_cleanup(struct lnet_net *net); void lnet_peer_uninit(void); int lnet_peer_tables_create(void); diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index d533b8e..2cf9c89 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -2338,8 +2338,6 @@ msg->msg_hdr.dest_pid = dest_pid; msg->msg_hdr.payload_length = payload_length; } - /* Multi-Rail: Primary NID of source. */ - msg->msg_initiator = lnet_peer_primary_nid(src_nid); lnet_net_lock(cpt); lpni = lnet_nid2peerni_locked(from_nid, cpt); @@ -2357,6 +2355,8 @@ msg->msg_rxpeer = lpni; msg->msg_rxni = ni; lnet_ni_addref_locked(ni, cpt); + /* Multi-Rail: Primary NID of source. */ + msg->msg_initiator = lnet_peer_primary_nid_locked(src_nid); if (lnet_isrouter(msg->msg_rxpeer)) { lnet_peer_set_alive(msg->msg_rxpeer); @@ -2658,8 +2658,7 @@ struct lnet_msg * libcfs_nid2str(ni->ni_nid), libcfs_id2str(peer_id), getmd); /* setup information for lnet_build_msg_event */ - msg->msg_initiator = lnet_peer_primary_nid(peer_id.nid); - /* Cheaper: msg->msg_initiator = getmsg->msg_txpeer->lp_nid; */ + msg->msg_initiator = getmsg->msg_txpeer->lpni_peer_net->lpn_peer->lp_primary_nid; msg->msg_from = peer_id.nid; msg->msg_type = LNET_MSG_GET; /* flag this msg as an "optimized" GET */ msg->msg_hdr.src_nid = peer_id.nid; diff --git a/drivers/staging/lustre/lnet/lnet/peer.c b/drivers/staging/lustre/lnet/lnet/peer.c index 2fbf93a..ebb8435 100644 --- a/drivers/staging/lustre/lnet/lnet/peer.c +++ b/drivers/staging/lustre/lnet/lnet/peer.c @@ -593,19 +593,16 @@ struct lnet_peer_ni * } lnet_nid_t -lnet_peer_primary_nid(lnet_nid_t nid) +lnet_peer_primary_nid_locked(lnet_nid_t nid) { struct lnet_peer_ni *lpni; lnet_nid_t primary_nid = nid; - int cpt; - cpt = lnet_net_lock_current(); lpni = lnet_find_peer_ni_locked(nid); if (lpni) { primary_nid = lpni->lpni_peer_net->lpn_peer->lp_primary_nid; lnet_peer_ni_decref_locked(lpni); } - lnet_net_unlock(cpt); return primary_nid; } From patchwork Wed Sep 26 02:48:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615127 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0F4413A4 for ; Wed, 26 Sep 2018 02:49:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B08CC2A69D for ; Wed, 26 Sep 2018 02:49:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A509D2A879; Wed, 26 Sep 2018 02:49:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 28DA92A69D for ; Wed, 26 Sep 2018 02:48:57 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5837821FAB4; Tue, 25 Sep 2018 19:48:44 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DA77121F4CD for ; Tue, 25 Sep 2018 19:48:23 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 4325D1005373; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 41E25832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:02 -0400 Message-Id: <1537930097-11624-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/25] lustre: lnet: increment per NI stats X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata Increment the per NI stats for messages being routed. This will give a better view of the traffic distribution over multiple peer interfaces. Added extra trace messages to track the messages sent and received. Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-9119 Reviewed-on: https://review.whamcloud.com/26907 Reviewed-by: Doug Oucharek Reviewed-by: Sonia Sharma Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/lnet/lib-move.c | 16 ++++++++++++++++ drivers/staging/lustre/lnet/lnet/lib-msg.c | 8 ++++++-- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index 2cf9c89..275e8a9 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1723,6 +1723,16 @@ rc = lnet_post_send_locked(msg, 0); + if (!rc) + CDEBUG(D_NET, "TRACE: %s(%s:%s) -> %s(%s:%s) : %s\n", + libcfs_nid2str(msg->msg_hdr.src_nid), + libcfs_nid2str(msg->msg_txni->ni_nid), + libcfs_nid2str(src_nid), + libcfs_nid2str(msg->msg_hdr.dest_nid), + libcfs_nid2str(dst_nid), + libcfs_nid2str(msg->msg_txpeer->lpni_nid), + lnet_msgtyp2str(msg->msg_type)); + lnet_net_unlock(cpt); return rc; @@ -2195,6 +2205,12 @@ for_me = (ni->ni_nid == dest_nid); cpt = lnet_cpt_of_nid(from_nid, ni); + CDEBUG(D_NET, "TRACE: %s(%s) <- %s : %s\n", + libcfs_nid2str(dest_nid), + libcfs_nid2str(ni->ni_nid), + libcfs_nid2str(src_nid), + lnet_msgtyp2str(type)); + switch (type) { case LNET_MSG_ACK: case LNET_MSG_GET: diff --git a/drivers/staging/lustre/lnet/lnet/lib-msg.c b/drivers/staging/lustre/lnet/lnet/lib-msg.c index 00be9ab..1817e54 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-msg.c +++ b/drivers/staging/lustre/lnet/lnet/lib-msg.c @@ -187,7 +187,7 @@ counters->route_length += msg->msg_len; counters->route_count++; - goto out; + goto incr_stats; case LNET_EVENT_PUT: /* should have been decommitted */ @@ -215,6 +215,8 @@ } counters->send_count++; + +incr_stats: if (msg->msg_txpeer) atomic_inc(&msg->msg_txpeer->lpni_stats.send_count); if (msg->msg_txni) @@ -241,7 +243,7 @@ default: LASSERT(!ev->type); LASSERT(msg->msg_routing); - goto out; + goto incr_stats; case LNET_EVENT_ACK: LASSERT(msg->msg_type == LNET_MSG_ACK); @@ -274,6 +276,8 @@ } counters->recv_count++; + +incr_stats: if (msg->msg_rxpeer) atomic_inc(&msg->msg_rxpeer->lpni_stats.recv_count); if (msg->msg_rxni) From patchwork Wed Sep 26 02:48:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615137 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A6D25161F for ; Wed, 26 Sep 2018 02:49:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96ACB2A69D for ; Wed, 26 Sep 2018 02:49:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8B2902A879; Wed, 26 Sep 2018 02:49:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 221D82A69D for ; Wed, 26 Sep 2018 02:49:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 059F421F4E0; Tue, 25 Sep 2018 19:48:56 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 3284521F4E0 for ; Tue, 25 Sep 2018 19:48:24 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 45D3C1005374; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 44BCA832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:03 -0400 Message-Id: <1537930097-11624-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 11/25] lustre: lnet: Fix lost lock X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Dmitry Eremin Unlock lnet_net_lock in case of error in function lnet_select_pathway(). Signed-off-by: Dmitry Eremin WC-bug-id: https://jira.whamcloud.com/browse/LU-9607 Reviewed-on: https://review.whamcloud.com/27455 Reviewed-by: Doug Oucharek Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/lnet/lib-move.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index 275e8a9..f2bc97d 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1260,6 +1260,7 @@ } if (!peer->lp_multi_rail && lnet_get_num_peer_nis(peer) > 1) { + lnet_net_unlock(cpt); CERROR("peer %s is declared to be non MR capable, yet configured with more than one NID\n", libcfs_nid2str(dst_nid)); return -EINVAL; From patchwork Wed Sep 26 02:48:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615147 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B2249174A for ; Wed, 26 Sep 2018 02:49:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9F70F2A69D for ; Wed, 26 Sep 2018 02:49:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 940CE2A86E; Wed, 26 Sep 2018 02:49:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1B7A12A866 for ; Wed, 26 Sep 2018 02:49:28 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7B96621FA19; Tue, 25 Sep 2018 19:49:04 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 73AC021F4E0 for ; Tue, 25 Sep 2018 19:48:24 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 49F8E1005377; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 477C6832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:04 -0400 Message-Id: <1537930097-11624-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/25] lustre: lnet: correct locking in legacy add net X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata Make sure to unlock the api mutex properly in lnet_dyn_add_net() Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-9729 Reviewed-on: https://review.whamcloud.com/27907 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/lnet/api-ni.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/api-ni.c b/drivers/staging/lustre/lnet/lnet/api-ni.c index 7c907a3..b37abde 100644 --- a/drivers/staging/lustre/lnet/lnet/api-ni.c +++ b/drivers/staging/lustre/lnet/lnet/api-ni.c @@ -2435,7 +2435,7 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) if (rc > 1) { rc = -EINVAL; /* only add one network per call */ - goto failed; + goto out_unlock_clean; } net = list_entry(net_head.next, struct lnet_net, net_list); @@ -2455,14 +2455,11 @@ int lnet_dyn_del_ni(struct lnet_ioctl_config_ni *conf) conf->cfg_config_u.cfg_net.net_max_tx_credits; rc = lnet_add_net_common(net, &tun); - if (rc != 0) - goto failed; - return 0; - -failed: +out_unlock_clean: mutex_unlock(&the_lnet.ln_api_mutex); while (!list_empty(&net_head)) { + /* net_head list is empty in success case */ net = list_entry(net_head.next, struct lnet_net, net_list); list_del_init(&net->net_list); lnet_net_free(net); From patchwork Wed Sep 26 02:48:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615121 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BCDC813A4 for ; Wed, 26 Sep 2018 02:48:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AB8512A69D for ; Wed, 26 Sep 2018 02:48:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A068E2A879; Wed, 26 Sep 2018 02:48:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 25CFF2A69D for ; Wed, 26 Sep 2018 02:48:48 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8EEC221F935; Tue, 25 Sep 2018 19:48:38 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B183B21F4E0 for ; Tue, 25 Sep 2018 19:48:24 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 4C2AB1005378; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4A615833; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:05 -0400 Message-Id: <1537930097-11624-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 13/25] lustre: lnet: fix lnet_cpt_of_md() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata The intent of this function is to get the cpt nearest to the memory described by the MD. There are three scenarios that must be handled: 1. The memory is described by an lnet_kiov_t structure -> this describes kernel pages 2. The memory is described by a struct kvec -> this describes kernel logical addresses 3. The memory is a contiguous buffer allocated via vmalloc For case 1 and 2 we look at the first vector which contains the data to be DMAed, taking into consideration the msg offset. For case 2 we have to take the extra step of translating the kernel logical address to a physical page using virt_to_page() macro. For case 3 we need to use is_vmalloc_addr() and vmalloc_to_page to get the associated page to be able to identify the CPT. o2iblnd uses the same strategy when it's mapping the memory into a scatter/gather list. Therefore, lnet_kvaddr_to_page() common function was created to be used by both the o2iblnd and lnet_cpt_of_md() kmap_to_page() performs the high memory check which lnet_kvaddr_to_page() does. However, unlike the latter it handles the highmem case properly instead of calling LBUG. It's not 100% clear why the code was written that way. Since the legacy code will need to still be maintained, adding kmap_to_page() will not simplify the code. At worst calling kmap_to_page() might mask some problems which would've been caught by the LBUG earlier on. However, at the time of this fix, that LBUG has never been observed. Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-9203 Reviewed-on: https://review.whamcloud.com/28165 Reviewed-by: Dmitry Eremin Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/include/linux/lnet/lib-lnet.h | 3 +- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 25 +----- drivers/staging/lustre/lnet/lnet/lib-md.c | 96 ++++++++++++++++++---- drivers/staging/lustre/lnet/lnet/lib-move.c | 2 +- 4 files changed, 82 insertions(+), 44 deletions(-) diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h index 6bfdc9b..16e64d8 100644 --- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h +++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h @@ -595,7 +595,8 @@ void lnet_copy_kiov2iter(struct iov_iter *to, void lnet_md_unlink(struct lnet_libmd *md); void lnet_md_deconstruct(struct lnet_libmd *lmd, struct lnet_md *umd); -int lnet_cpt_of_md(struct lnet_libmd *md); +struct page *lnet_kvaddr_to_page(unsigned long vaddr); +int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset); void lnet_register_lnd(struct lnet_lnd *lnd); void lnet_unregister_lnd(struct lnet_lnd *lnd); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index debed17..a6b261a 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -531,29 +531,6 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, kiblnd_drop_rx(rx); /* Don't re-post rx. */ } -static struct page * -kiblnd_kvaddr_to_page(unsigned long vaddr) -{ - struct page *page; - - if (is_vmalloc_addr((void *)vaddr)) { - page = vmalloc_to_page((void *)vaddr); - LASSERT(page); - return page; - } -#ifdef CONFIG_HIGHMEM - if (vaddr >= PKMAP_BASE && - vaddr < (PKMAP_BASE + LAST_PKMAP * PAGE_SIZE)) { - /* No highmem pages only used for bulk (kiov) I/O */ - CERROR("find page for address in highmem\n"); - LBUG(); - } -#endif - page = virt_to_page(vaddr); - LASSERT(page); - return page; -} - static int kiblnd_fmr_map_tx(struct kib_net *net, struct kib_tx *tx, struct kib_rdma_desc *rd, __u32 nob) { @@ -660,7 +637,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, vaddr = ((unsigned long)iov->iov_base) + offset; page_offset = vaddr & (PAGE_SIZE - 1); - page = kiblnd_kvaddr_to_page(vaddr); + page = lnet_kvaddr_to_page(vaddr); if (!page) { CERROR("Can't find page\n"); return -EFAULT; diff --git a/drivers/staging/lustre/lnet/lnet/lib-md.c b/drivers/staging/lustre/lnet/lnet/lib-md.c index 9e26911..db5425e 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-md.c +++ b/drivers/staging/lustre/lnet/lnet/lib-md.c @@ -84,33 +84,93 @@ kfree(md); } -int -lnet_cpt_of_md(struct lnet_libmd *md) +struct page *lnet_kvaddr_to_page(unsigned long vaddr) { - int cpt = CFS_CPT_ANY; + if (is_vmalloc_addr((void *)vaddr)) + return vmalloc_to_page((void *)vaddr); + +#ifdef CONFIG_HIGHMEM + return kmap_to_page((void *)vaddr); +#else + return virt_to_page(vaddr); +#endif /* CONFIG_HIGHMEM */ +} +EXPORT_SYMBOL(lnet_kvaddr_to_page); - if (!md) - return CFS_CPT_ANY; +int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset) +{ + int cpt = CFS_CPT_ANY; + unsigned int niov; - if ((md->md_options & LNET_MD_BULK_HANDLE) != 0 && - md->md_bulk_handle.cookie != LNET_WIRE_HANDLE_COOKIE_NONE) { + /* + * if the md_options has a bulk handle then we want to look at the + * bulk md because that's the data which we will be DMAing + */ + if (md && (md->md_options & LNET_MD_BULK_HANDLE) != 0 && + md->md_bulk_handle.cookie != LNET_WIRE_HANDLE_COOKIE_NONE) md = lnet_handle2md(&md->md_bulk_handle); - if (!md) - return CFS_CPT_ANY; - } + if (!md || md->md_niov == 0) + return CFS_CPT_ANY; + + niov = md->md_niov; + /* + * There are three cases to handle: + * 1. The MD is using lnet_kiov_t + * 2. The MD is using struct kvec + * 3. Contiguous buffer allocated via vmalloc + * + * in case 2 we can use virt_to_page() macro to get the page + * address of the memory kvec describes. + * + * in case 3 use is_vmalloc_addr() and vmalloc_to_page() + * + * The offset provided can be within the first iov/kiov entry or + * it could go beyond it. In that case we need to make sure to + * look at the page which actually contains the data that will be + * DMAed. + */ if ((md->md_options & LNET_MD_KIOV) != 0) { - if (md->md_iov.kiov[0].bv_page) - cpt = cfs_cpt_of_node( - lnet_cpt_table(), - page_to_nid(md->md_iov.kiov[0].bv_page)); - } else if (md->md_iov.iov[0].iov_base) { - cpt = cfs_cpt_of_node( - lnet_cpt_table(), - page_to_nid(virt_to_page(md->md_iov.iov[0].iov_base))); + struct bio_vec *kiov = md->md_iov.kiov; + + while (offset >= kiov->bv_len) { + offset -= kiov->bv_len; + niov--; + kiov++; + if (niov == 0) { + CERROR("offset %d goes beyond kiov\n", offset); + goto out; + } + } + + cpt = cfs_cpt_of_node(lnet_cpt_table(), + page_to_nid(kiov->bv_page)); + } else { + struct kvec *iov = md->md_iov.iov; + unsigned long vaddr; + struct page *page; + + while (offset >= iov->iov_len) { + offset -= iov->iov_len; + niov--; + iov++; + if (niov == 0) { + CERROR("offset %d goes beyond iov\n", offset); + goto out; + } + } + + vaddr = ((unsigned long)iov->iov_base) + offset; + page = lnet_kvaddr_to_page(vaddr); + if (!page) { + CERROR("Couldn't resolve vaddr 0x%lx to page\n", vaddr); + goto out; + } + cpt = cfs_cpt_of_node(lnet_cpt_table(), page_to_nid(page)); } +out: return cpt; } diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index f2bc97d..4d74421 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -1226,7 +1226,7 @@ */ cpt = lnet_net_lock_current(); - md_cpt = lnet_cpt_of_md(msg->msg_md); + md_cpt = lnet_cpt_of_md(msg->msg_md, msg->msg_offset); if (md_cpt == CFS_CPT_ANY) md_cpt = cpt; From patchwork Wed Sep 26 02:48:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615141 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 482F813A4 for ; Wed, 26 Sep 2018 02:49:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 38D472A86E for ; Wed, 26 Sep 2018 02:49:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2CF6C2A879; Wed, 26 Sep 2018 02:49:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B69D02A69D for ; Wed, 26 Sep 2018 02:49:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 9B24C21FAA3; Tue, 25 Sep 2018 19:48:59 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1108521F519 for ; Tue, 25 Sep 2018 19:48:25 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 4EC95100537A; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4D790832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:06 -0400 Message-Id: <1537930097-11624-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/25] lustre: lnet: safe access to msg X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata When tx credits are returned if there are pending messages they need to be sent. Messages could have different tx_cpts, so the correct one needs to be locked. After lnet_post_send_locked(), if we locked a different CPT then we need to relock the correct one However, as part of lnet_post_send_locked(), lnet_finalze() can be called which can free the message. Therefore, the cpt of the message being passed must be cached in order to prevent access to freed memory. Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-9817 Reviewed-on: https://review.whamcloud.com/28308 Reviewed-by: Olaf Weber Reviewed-by: Sonia Sharma Reviewed-by: Dmitry Eremin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/lnet/lib-move.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/staging/lustre/lnet/lnet/lib-move.c b/drivers/staging/lustre/lnet/lnet/lib-move.c index 4d74421..e8c0216 100644 --- a/drivers/staging/lustre/lnet/lnet/lib-move.c +++ b/drivers/staging/lustre/lnet/lnet/lib-move.c @@ -847,6 +847,8 @@ txpeer->lpni_txcredits++; if (txpeer->lpni_txcredits <= 0) { + int msg2_cpt; + msg2 = list_entry(txpeer->lpni_txq.next, struct lnet_msg, msg_list); list_del(&msg2->msg_list); @@ -855,13 +857,26 @@ LASSERT(msg2->msg_txpeer == txpeer); LASSERT(msg2->msg_tx_delayed); - if (msg2->msg_tx_cpt != msg->msg_tx_cpt) { + msg2_cpt = msg2->msg_tx_cpt; + + /* + * The msg_cpt can be different from the msg2_cpt + * so we need to make sure we lock the correct cpt + * for msg2. + * Once we call lnet_post_send_locked() it is no + * longer safe to access msg2, since it could've + * been freed by lnet_finalize(), but we still + * need to relock the correct cpt, so we cache the + * msg2_cpt for the purpose of the check that + * follows the call to lnet_pose_send_locked(). + */ + if (msg2_cpt != msg->msg_tx_cpt) { lnet_net_unlock(msg->msg_tx_cpt); - lnet_net_lock(msg2->msg_tx_cpt); + lnet_net_lock(msg2_cpt); } (void)lnet_post_send_locked(msg2, 1); - if (msg2->msg_tx_cpt != msg->msg_tx_cpt) { - lnet_net_unlock(msg2->msg_tx_cpt); + if (msg2_cpt != msg->msg_tx_cpt) { + lnet_net_unlock(msg2_cpt); lnet_net_lock(msg->msg_tx_cpt); } } else { From patchwork Wed Sep 26 02:48:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615149 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 34B2F161F for ; Wed, 26 Sep 2018 02:49:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 233192A866 for ; Wed, 26 Sep 2018 02:49:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 155002A86E; Wed, 26 Sep 2018 02:49:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 983B52A69D for ; Wed, 26 Sep 2018 02:49:34 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E710E21FE65; Tue, 25 Sep 2018 19:49:08 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4EAD421F52A for ; Tue, 25 Sep 2018 19:48:25 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5259C1005382; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 508A1832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:07 -0400 Message-Id: <1537930097-11624-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/25] lustre: o2iblnd: reconnect peer for REJ_INVALID_SERVICE_ID X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sergey Cheremencev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Sergey Cheremencev Don't kill the peer in case of INVALID_SERVICE_ID. This produces huge number of peers for the same nid and may cause an OOM. The OOM was frequently seen with mlnx-ofa-kernel-2.3 where used RCU mechanism in mlx4_cq_free. In older mlnx4 versions to mitigate the issue RCU was changed with spin locks. Signed-off-by: Sergey Cheremencev WC-bug-id: https://jira.whamcloud.com/browse/LU-9094 Seagate-bug-id: MRP-4056 Reviewed-on: https://review.whamcloud.com/25378 Reviewed-by: Doug Oucharek Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 1 + drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index a3d89ec..de04355 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -460,6 +460,7 @@ struct kib_rej { #define IBLND_REJECT_RDMA_FRAGS 6 /* peer_ni's msg queue size doesn't match mine */ #define IBLND_REJECT_MSG_QUEUE_SIZE 7 +#define IBLND_REJECT_INVALID_SRV_ID 8 /***********************************************************************/ diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index a6b261a..dc71554 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -2611,6 +2611,10 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, case IBLND_REJECT_CONN_UNCOMPAT: reason = "version negotiation"; break; + + case IBLND_REJECT_INVALID_SRV_ID: + reason = "invalid service id"; + break; } conn->ibc_reconnect = 1; @@ -2648,6 +2652,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, break; case IB_CM_REJ_INVALID_SERVICE_ID: + kiblnd_check_reconnect(conn, IBLND_MSG_VERSION, 0, + IBLND_REJECT_INVALID_SRV_ID, NULL); CNETERR("%s rejected: no listener at %d\n", libcfs_nid2str(peer_ni->ibp_nid), *kiblnd_tunables.kib_service); From patchwork Wed Sep 26 02:48:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615125 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8BA6161F for ; Wed, 26 Sep 2018 02:48:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9917D2A69D for ; Wed, 26 Sep 2018 02:48:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8DAEC2A86E; Wed, 26 Sep 2018 02:48:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 644AF2A69D for ; Wed, 26 Sep 2018 02:48:53 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 4E76621F55F; Tue, 25 Sep 2018 19:48:42 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8F04021F52A for ; Tue, 25 Sep 2018 19:48:25 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 552471005393; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 53684833; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:08 -0400 Message-Id: <1537930097-11624-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 16/25] lustre: o2iblnd: kill timedout txs from ibp_tx_queue X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sergey Cheremencev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Sergey Cheremencev Sometimes connection can't be established for a long time due to rejections and produces cycle of reconnections. Peer is not removed in each iteration unlike connection. Thus until connection becomes established txs live in peer->ibp_tx_queue. This patch adds tx_deadline checking for txs from peer tx_queue. Signed-off-by: Sergey Cheremencev WC-bug-id: https://jira.whamcloud.com/browse/LU-9094 Seagate-bug-id: MRP-4056 Reviewed-on: https://review.whamcloud.com/25376 Reviewed-by: Doug Oucharek Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index dc71554..3218999 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -3159,8 +3159,10 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, { LIST_HEAD(closes); LIST_HEAD(checksends); + LIST_HEAD(timedout_txs); struct list_head *peers = &kiblnd_data.kib_peers[idx]; struct kib_peer_ni *peer_ni; + struct kib_tx *tx_tmp, *tx; struct kib_conn *conn; unsigned long flags; @@ -3169,9 +3171,19 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, * RDMAs to time out, so we just use a shared lock while we * take a look... */ - read_lock_irqsave(&kiblnd_data.kib_global_lock, flags); + write_lock_irqsave(&kiblnd_data.kib_global_lock, flags); list_for_each_entry(peer_ni, peers, ibp_list) { + /* Check tx_deadline */ + list_for_each_entry_safe(tx, tx_tmp, &peer_ni->ibp_tx_queue, tx_list) { + if (ktime_compare(ktime_get(), tx->tx_deadline) >= 0) { + CWARN("Timed out tx for %s: %lld seconds\n", + libcfs_nid2str(peer_ni->ibp_nid), + ktime_ms_delta(ktime_get(), + tx->tx_deadline) / MSEC_PER_SEC); + list_move(&tx->tx_list, &timedout_txs); + } + } list_for_each_entry(conn, &peer_ni->ibp_conns, ibc_list) { int timedout; @@ -3207,7 +3219,10 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, } } - read_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); + write_unlock_irqrestore(&kiblnd_data.kib_global_lock, flags); + + if (!list_empty(&timedout_txs)) + kiblnd_txlist_done(&timedout_txs, -ETIMEDOUT); /* * Handle timeout by closing the whole From patchwork Wed Sep 26 02:48:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615131 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 49FB6161F for ; Wed, 26 Sep 2018 02:49:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 382FE2A69D for ; Wed, 26 Sep 2018 02:49:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2CB702A879; Wed, 26 Sep 2018 02:49:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 95CA12A69D for ; Wed, 26 Sep 2018 02:49:03 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 690C821FB81; Tue, 25 Sep 2018 19:48:48 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D4DBC21F53B for ; Tue, 25 Sep 2018 19:48:25 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5822F1005394; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 56D31832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:09 -0400 Message-Id: <1537930097-11624-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 17/25] lustre: o2iblnd: multiple sges for work request X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Liang Zhen In current protocol, lnet router cannot align buffer for rdma, o2iblnd may run into "too fragmented RDMA" issue while routing non-page-aligned IO larger than 512K, because each page will be splited into two fragments by kiblnd_init_rdma(). With this patch, o2iblnd can have multiple sges for each work request, and combine multiple remote fragments of the same page into one work request to resovle the "too fragmented RDMA" issue. Signed-off-by: Liang Zhen WC-bug-id: https://jira.whamcloud.com/browse/LU-5718 Reviewed-on: https://review.whamcloud.com/12451 Reviewed-by: Amir Shehata Reviewed-by: Nathaniel Clark Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 13 ++-- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 5 ++ .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 73 ++++++++++++---------- .../lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c | 7 ++- 4 files changed, 60 insertions(+), 38 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index aea83a5..9e8248e 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -761,7 +761,7 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, init_qp_attr->qp_context = conn; init_qp_attr->cap.max_send_wr = IBLND_SEND_WRS(conn); init_qp_attr->cap.max_recv_wr = IBLND_RECV_WRS(conn); - init_qp_attr->cap.max_send_sge = 1; + init_qp_attr->cap.max_send_sge = *kiblnd_tunables.kib_wrq_sge; init_qp_attr->cap.max_recv_sge = 1; init_qp_attr->sq_sig_type = IB_SIGNAL_REQ_WR; init_qp_attr->qp_type = IB_QPT_RC; @@ -772,9 +772,11 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr); if (rc) { - CERROR("Can't create QP: %d, send_wr: %d, recv_wr: %d\n", + CERROR("Can't create QP: %d, send_wr: %d, recv_wr: %d, send_sge: %d, recv_sge: %d\n", rc, init_qp_attr->cap.max_send_wr, - init_qp_attr->cap.max_recv_wr); + init_qp_attr->cap.max_recv_wr, + init_qp_attr->cap.max_send_sge, + init_qp_attr->cap.max_recv_sge); goto failed_2; } @@ -2039,6 +2041,7 @@ static int kiblnd_create_tx_pool(struct kib_poolset *ps, int size, for (i = 0; i < size; i++) { struct kib_tx *tx = &tpo->tpo_tx_descs[i]; + int wrq_sge = *kiblnd_tunables.kib_wrq_sge; tx->tx_pool = tpo; if (ps->ps_net->ibn_fmr_ps) { @@ -2063,8 +2066,8 @@ static int kiblnd_create_tx_pool(struct kib_poolset *ps, int size, break; tx->tx_sge = kzalloc_cpt((1 + IBLND_MAX_RDMA_FRAGS) * - sizeof(*tx->tx_sge), - GFP_NOFS, ps->ps_cpt); + wrq_sge * sizeof(*tx->tx_sge), + GFP_KERNEL, ps->ps_cpt); if (!tx->tx_sge) break; diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index de04355..f21bdee 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -89,6 +89,7 @@ struct kib_tunables { int *kib_require_priv_port; /* accept only privileged ports */ int *kib_use_priv_port; /* use privileged port for active connect */ int *kib_nscheds; /* # threads on each CPT */ + int *kib_wrq_sge; /* # sg elements per wrq */ }; extern struct kib_tunables kiblnd_tunables; @@ -495,7 +496,11 @@ struct kib_tx { /* transmit message */ struct kib_msg *tx_msg; /* message buffer (host vaddr) */ __u64 tx_msgaddr; /* message buffer (I/O addr) */ DECLARE_PCI_UNMAP_ADDR(tx_msgunmap); /* for dma_unmap_single() */ + /** sge for tx_msgaddr */ + struct ib_sge tx_msgsge; int tx_nwrq; /* # send work items */ + /* # used scatter/gather elements */ + int tx_nsge; struct ib_rdma_wr *tx_wrq; /* send work items... */ struct ib_sge *tx_sge; /* ...and their memory */ struct kib_rdma_desc *tx_rd; /* rdma descriptor */ diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index 3218999..80398c1 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -79,6 +79,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, } tx->tx_nwrq = 0; + tx->tx_nsge = 0; tx->tx_status = 0; kiblnd_pool_free_node(&tx->tx_pool->tpo_pool, &tx->tx_list); @@ -415,6 +416,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, * (b) tx_waiting set tells tx_complete() it's not done. */ tx->tx_nwrq = 0; /* overwrite PUT_REQ */ + tx->tx_nsge = 0; rc2 = kiblnd_init_rdma(conn, tx, IBLND_MSG_PUT_DONE, kiblnd_rd_size(&msg->ibm_u.putack.ibpam_rd), @@ -724,7 +726,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, LASSERT(tx->tx_queued); /* We rely on this for QP sizing */ - LASSERT(tx->tx_nwrq > 0); + LASSERT(tx->tx_nwrq > 0 && tx->tx_nsge >= 0); LASSERT(!credit || credit == 1); LASSERT(conn->ibc_outstanding_credits >= 0); @@ -988,7 +990,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, int body_nob) { struct kib_hca_dev *hdev = tx->tx_pool->tpo_hdev; - struct ib_sge *sge = &tx->tx_sge[tx->tx_nwrq]; + struct ib_sge *sge = &tx->tx_msgsge; struct ib_rdma_wr *wrq = &tx->tx_wrq[tx->tx_nwrq]; int nob = offsetof(struct kib_msg, ibm_u) + body_nob; @@ -1020,17 +1022,17 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, { struct kib_msg *ibmsg = tx->tx_msg; struct kib_rdma_desc *srcrd = tx->tx_rd; - struct ib_sge *sge = &tx->tx_sge[0]; - struct ib_rdma_wr *wrq, *next; + struct ib_rdma_wr *wrq = NULL; + struct ib_sge *sge; int rc = resid; int srcidx = 0; int dstidx = 0; - int wrknob; + int sge_nob; + int wrq_sge; LASSERT(!in_interrupt()); - LASSERT(!tx->tx_nwrq); - LASSERT(type == IBLND_MSG_GET_DONE || - type == IBLND_MSG_PUT_DONE); + LASSERT(!tx->tx_nwrq && !tx->tx_nsge); + LASSERT(type == IBLND_MSG_GET_DONE || type == IBLND_MSG_PUT_DONE); if (kiblnd_rd_size(srcrd) > conn->ibc_max_frags << PAGE_SHIFT) { CERROR("RDMA is too large for peer_ni %s (%d), src size: %d dst size: %d\n", @@ -1041,7 +1043,10 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, goto too_big; } - while (resid > 0) { + for (srcidx = dstidx = wrq_sge = sge_nob = 0; + resid > 0; resid -= sge_nob) { + int prev = dstidx; + if (srcidx >= srcrd->rd_nfrags) { CERROR("Src buffer exhausted: %d frags\n", srcidx); rc = -EPROTO; @@ -1064,40 +1069,44 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, break; } - wrknob = min3(kiblnd_rd_frag_size(srcrd, srcidx), - kiblnd_rd_frag_size(dstrd, dstidx), - (__u32)resid); + sge_nob = min3(kiblnd_rd_frag_size(srcrd, srcidx), + kiblnd_rd_frag_size(dstrd, dstidx), + (u32)resid); - sge = &tx->tx_sge[tx->tx_nwrq]; + sge = &tx->tx_sge[tx->tx_nsge]; sge->addr = kiblnd_rd_frag_addr(srcrd, srcidx); sge->lkey = kiblnd_rd_frag_key(srcrd, srcidx); - sge->length = wrknob; - - wrq = &tx->tx_wrq[tx->tx_nwrq]; - next = wrq + 1; + sge->length = sge_nob; - wrq->wr.next = &next->wr; - wrq->wr.wr_id = kiblnd_ptr2wreqid(tx, IBLND_WID_RDMA); - wrq->wr.sg_list = sge; - wrq->wr.num_sge = 1; - wrq->wr.opcode = IB_WR_RDMA_WRITE; - wrq->wr.send_flags = 0; + if (wrq_sge == 0) { + wrq = &tx->tx_wrq[tx->tx_nwrq]; - wrq->remote_addr = kiblnd_rd_frag_addr(dstrd, dstidx); - wrq->rkey = kiblnd_rd_frag_key(dstrd, dstidx); + wrq->wr.next = &(wrq + 1)->wr; + wrq->wr.wr_id = kiblnd_ptr2wreqid(tx, IBLND_WID_RDMA); + wrq->wr.sg_list = sge; + wrq->wr.opcode = IB_WR_RDMA_WRITE; + wrq->wr.send_flags = 0; - srcidx = kiblnd_rd_consume_frag(srcrd, srcidx, wrknob); - dstidx = kiblnd_rd_consume_frag(dstrd, dstidx, wrknob); + wrq->remote_addr = kiblnd_rd_frag_addr(dstrd, dstidx); + wrq->rkey = kiblnd_rd_frag_key(dstrd, dstidx); + } - resid -= wrknob; + srcidx = kiblnd_rd_consume_frag(srcrd, srcidx, sge_nob); + dstidx = kiblnd_rd_consume_frag(dstrd, dstidx, sge_nob); - tx->tx_nwrq++; - wrq++; - sge++; + wrq_sge++; + if (wrq_sge == *kiblnd_tunables.kib_wrq_sge || dstidx != prev) { + tx->tx_nwrq++; + wrq->wr.num_sge = wrq_sge; + wrq_sge = 0; + } + tx->tx_nsge++; } too_big: - if (rc < 0) /* no RDMA if completing with failure */ + if (rc < 0) { /* no RDMA if completing with failure */ + tx->tx_nsge = 0; tx->tx_nwrq = 0; + } ibmsg->ibm_u.completion.ibcm_status = rc; ibmsg->ibm_u.completion.ibcm_cookie = dstcookie; diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c index 5117594..891708e 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c @@ -147,6 +147,10 @@ module_param(use_privileged_port, int, 0644); MODULE_PARM_DESC(use_privileged_port, "use privileged port when initiating connection"); +static unsigned int wrq_sge = 1; +module_param(wrq_sge, uint, 0444); +MODULE_PARM_DESC(wrq_sge, "# scatter/gather element per work request"); + struct kib_tunables kiblnd_tunables = { .kib_dev_failover = &dev_failover, .kib_service = &service, @@ -160,7 +164,8 @@ struct kib_tunables kiblnd_tunables = { .kib_ib_mtu = &ib_mtu, .kib_require_priv_port = &require_privileged_port, .kib_use_priv_port = &use_privileged_port, - .kib_nscheds = &nscheds + .kib_nscheds = &nscheds, + .kib_wrq_sge = &wrq_sge, }; static struct lnet_ioctl_config_o2iblnd_tunables default_tunables; From patchwork Wed Sep 26 02:48:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615145 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 44C1913A4 for ; Wed, 26 Sep 2018 02:49:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3542C2A69D for ; Wed, 26 Sep 2018 02:49:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2843E2A879; Wed, 26 Sep 2018 02:49:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B04E42A69D for ; Wed, 26 Sep 2018 02:49:27 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1F41521FB4C; Tue, 25 Sep 2018 19:49:04 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 348BE21F53B for ; Tue, 25 Sep 2018 19:48:26 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5AF151005395; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 59E05832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:10 -0400 Message-Id: <1537930097-11624-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 18/25] lustre: lnd: Turn on 2 sges by default X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Doug Oucharek Currently, the fix in LU-5718 which allows for multiple sges to deal with RDMA fragmentation is turned off by default (set to 1). This patch changes the default to 2 so RDMA fragmentation is fixed by default. Signed-off-by: Doug Oucharek WC-bug-id: https://jira.whamcloud.com/browse/LU-9425 Reviewed-on: https://review.whamcloud.com/26911 Reviewed-by: Ned Bass Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c index 891708e..3bb537e 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c @@ -147,7 +147,7 @@ module_param(use_privileged_port, int, 0644); MODULE_PARM_DESC(use_privileged_port, "use privileged port when initiating connection"); -static unsigned int wrq_sge = 1; +static unsigned int wrq_sge = 2; module_param(wrq_sge, uint, 0444); MODULE_PARM_DESC(wrq_sge, "# scatter/gather element per work request"); From patchwork Wed Sep 26 02:48:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615151 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A7D74161F for ; Wed, 26 Sep 2018 02:49:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9771A2A69D for ; Wed, 26 Sep 2018 02:49:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8BE662A879; Wed, 26 Sep 2018 02:49:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1C2642A69D for ; Wed, 26 Sep 2018 02:49:38 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 58AD321FBC9; Tue, 25 Sep 2018 19:49:12 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7277E21F53B for ; Tue, 25 Sep 2018 19:48:26 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5DE3F1005396; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5CB99832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:11 -0400 Message-Id: <1537930097-11624-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 19/25] lustre: lnd: Don't Assert On Reconnect with MultiQP X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Doug Oucharek The work from LU-8943 activated the ability to have multiple connections between peers. If any of those connections need to be reconnected (i.e. parameter renegotiation), we were getting an assert from kiblnd_reconnect_peer() which was not changed to allow for having multiple connections ongoing at the same time. This patch gets rid of the assert from kiblnd_reconnect_peer() which is no longer valid after LU-8943. Signed-off-by: Doug Oucharek WC-bug-id: https://jira.whamcloud.com/browse/LU-9507 Reviewed-on: https://review.whamcloud.com/27139 Reviewed-by: Amir Shehata Reviewed-by: James Simmons Reviewed-by: Dmitry Eremin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index 80398c1..db563c0 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -1303,8 +1303,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, goto no_reconnect; } - LASSERT(!peer_ni->ibp_accepting && !peer_ni->ibp_connecting && - list_empty(&peer_ni->ibp_conns)); + if (peer_ni->ibp_accepting) + CNETERR("Detecting race between accepting and reconnecting\n"); peer_ni->ibp_reconnecting--; if (!kiblnd_peer_active(peer_ni)) { From patchwork Wed Sep 26 02:48:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615153 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 091BA13A4 for ; Wed, 26 Sep 2018 02:49:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E77862A866 for ; Wed, 26 Sep 2018 02:49:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DC36D2A879; Wed, 26 Sep 2018 02:49:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7A7E72A86E for ; Wed, 26 Sep 2018 02:49:41 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id D1E1221FE95; Tue, 25 Sep 2018 19:49:15 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B3FAE21F570 for ; Tue, 25 Sep 2018 19:48:26 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 60A9D1005397; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5F9F1832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:12 -0400 Message-Id: <1537930097-11624-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 20/25] lustre: lnet: handle empty CPTs X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata Make the LND code handle empty CPTs. If a scheduler is associated with an empty CPT it will have no threads created. If a NID hashes to that CPT, then pick the next scheduler which does have at least 1 started thread. Associate the connection with the CPT of the selected scheduler Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-9448 Reviewed-on: https://review.whamcloud.com/27145 Reviewed-by: Doug Oucharek Reviewed-by: Dmitry Eremin Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 42 ++++++++++++++++++++-- .../staging/lustre/lnet/klnds/socklnd/socklnd.c | 25 ++++++++++++- 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 9e8248e..0ce9962 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -622,6 +622,36 @@ static int kiblnd_get_completion_vector(struct kib_conn *conn, int cpt) return 1; } +/* + * Get the scheduler bound to this CPT. If the scheduler has no + * threads, which means that the CPT has no CPUs, then grab the + * next scheduler that we can use. + * + * This case would be triggered if a NUMA node is configured with + * no associated CPUs. + */ +static struct kib_sched_info *kiblnd_get_scheduler(int cpt) +{ + struct kib_sched_info *sched; + int i; + + sched = kiblnd_data.kib_scheds[cpt]; + + if (sched->ibs_nthreads > 0) + return sched; + + cfs_percpt_for_each(sched, i, kiblnd_data.kib_scheds) { + if (sched->ibs_nthreads > 0) { + CDEBUG(D_NET, + "scheduler[%d] has no threads. selected scheduler[%d]\n", + cpt, sched->ibs_cpt); + return sched; + } + } + + return NULL; +} + struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, struct rdma_cm_id *cmid, int state, int version) @@ -656,9 +686,17 @@ struct kib_conn *kiblnd_create_conn(struct kib_peer_ni *peer_ni, dev = net->ibn_dev; cpt = lnet_cpt_of_nid(peer_ni->ibp_nid, peer_ni->ibp_ni); - sched = kiblnd_data.kib_scheds[cpt]; + sched = kiblnd_get_scheduler(cpt); + if (!sched) { + CERROR("no schedulers available. node is unhealthy\n"); + goto failed_0; + } - LASSERT(sched->ibs_nthreads > 0); + /* + * The cpt might have changed if we ended up selecting a non cpt + * native scheduler. So use the scheduler's cpt instead. + */ + cpt = sched->ibs_cpt; init_qp_attr = kzalloc_cpt(sizeof(*init_qp_attr), GFP_NOFS, cpt); if (!init_qp_attr) { diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c index ad94837..1a49f5e 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c @@ -652,8 +652,19 @@ struct ksock_peer * struct ksock_sched *sched; int i; - LASSERT(info->ksi_nthreads > 0); + if (info->ksi_nthreads == 0) { + cfs_percpt_for_each(info, i, ksocknal_data.ksnd_sched_info) { + if (info->ksi_nthreads > 0) { + CDEBUG(D_NET, + "scheduler[%d] has no threads. selected scheduler[%d]\n", + cpt, info->ksi_cpt); + goto select_sched; + } + } + return NULL; + } +select_sched: sched = &info->ksi_scheds[0]; /* * NB: it's safe so far, but info->ksi_nthreads could be changed @@ -1255,6 +1266,15 @@ struct ksock_peer * peer_ni->ksnp_error = 0; sched = ksocknal_choose_scheduler_locked(cpt); + if (!sched) { + CERROR("no schedulers available. node is unhealthy\n"); + goto failed_2; + } + /* + * The cpt might have changed if we ended up selecting a non cpt + * native scheduler. So use the scheduler's cpt instead. + */ + cpt = sched->kss_info->ksi_cpt; sched->kss_nconns++; conn->ksnc_scheduler = sched; @@ -2402,6 +2422,9 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) info->ksi_nthreads_max = nthrs; info->ksi_cpt = i; + if (nthrs == 0) + continue; + info->ksi_scheds = kzalloc_cpt(info->ksi_nthreads_max * sizeof(*sched), GFP_NOFS, i); if (!info->ksi_scheds) From patchwork Wed Sep 26 02:48:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615129 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C183161F for ; Wed, 26 Sep 2018 02:49:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2ABA72A69D for ; Wed, 26 Sep 2018 02:49:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1F7F22A86E; Wed, 26 Sep 2018 02:49:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AAC352A866 for ; Wed, 26 Sep 2018 02:48:59 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id CC98C21F83C; Tue, 25 Sep 2018 19:48:45 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 14FE021F57B for ; Tue, 25 Sep 2018 19:48:27 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 637D31005398; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 62765832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:13 -0400 Message-Id: <1537930097-11624-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 21/25] lustre: lnet: set LND tunables properly X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata Make sure to set all NIs to the proper LND tunables specified. Add ntx tunable to dynamic configuration. This way all tunables required to tune OPA performance can be configured via lnetctl, allowing the ability to tune OPA network and IB network differently Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-9536 Reviewed-on: https://review.whamcloud.com/27263 Reviewed-by: Doug Oucharek Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h | 2 +- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 14 +++++++++----- .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c | 4 +++- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h index 07516fd..8f03aa3 100644 --- a/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h +++ b/drivers/staging/lustre/include/uapi/linux/lnet/lnet-dlc.h @@ -66,7 +66,7 @@ struct lnet_ioctl_config_o2iblnd_tunables { __u32 lnd_fmr_flush_trigger; __u32 lnd_fmr_cache; __u16 lnd_conns_per_peer; - __u16 pad; + __u16 lnd_ntx; }; struct lnet_lnd_tunables { diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 0ce9962..50c0c00 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -2034,9 +2034,13 @@ static void kiblnd_destroy_tx_pool(struct kib_pool *pool) kfree(tpo); } -static int kiblnd_tx_pool_size(int ncpts) +static int kiblnd_tx_pool_size(struct lnet_ni *ni, int ncpts) { - int ntx = *kiblnd_tunables.kib_ntx / ncpts; + struct lnet_ioctl_config_o2iblnd_tunables *tunables; + int ntx; + + tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib; + ntx = tunables->lnd_ntx / ncpts; return max(IBLND_TX_POOL, ntx); } @@ -2176,10 +2180,10 @@ static int kiblnd_net_init_pools(struct kib_net *net, struct lnet_ni *ni, tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib; - if (tunables->lnd_fmr_pool_size < *kiblnd_tunables.kib_ntx / 4) { + if (tunables->lnd_fmr_pool_size < tunables->lnd_ntx / 4) { CERROR("Can't set fmr pool size (%d) < ntx / 4(%d)\n", tunables->lnd_fmr_pool_size, - *kiblnd_tunables.kib_ntx / 4); + tunables->lnd_ntx / 4); rc = -EINVAL; goto failed; } @@ -2237,7 +2241,7 @@ static int kiblnd_net_init_pools(struct kib_net *net, struct lnet_ni *ni, cpt = !cpts ? i : cpts[i]; rc = kiblnd_init_poolset(&net->ibn_tx_ps[cpt]->tps_poolset, cpt, net, "TX", - kiblnd_tx_pool_size(ncpts), + kiblnd_tx_pool_size(ni, ncpts), kiblnd_create_tx_pool, kiblnd_destroy_tx_pool, kiblnd_tx_init, NULL); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c index 3bb537e..0f2ad91 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_modparams.c @@ -157,7 +157,6 @@ struct kib_tunables kiblnd_tunables = { .kib_cksum = &cksum, .kib_timeout = &timeout, .kib_keepalive = &keepalive, - .kib_ntx = &ntx, .kib_default_ipif = &ipif_name, .kib_retry_count = &retry_count, .kib_rnr_retry_count = &rnr_retry_count, @@ -282,6 +281,8 @@ int kiblnd_tunables_setup(struct lnet_ni *ni) tunables->lnd_fmr_flush_trigger = fmr_flush_trigger; if (!tunables->lnd_fmr_cache) tunables->lnd_fmr_cache = fmr_cache; + if (!tunables->lnd_ntx) + tunables->lnd_ntx = ntx; if (!tunables->lnd_conns_per_peer) { tunables->lnd_conns_per_peer = (conns_per_peer) ? conns_per_peer : 1; @@ -299,5 +300,6 @@ void kiblnd_tunables_init(void) default_tunables.lnd_fmr_pool_size = fmr_pool_size; default_tunables.lnd_fmr_flush_trigger = fmr_flush_trigger; default_tunables.lnd_fmr_cache = fmr_cache; + default_tunables.lnd_ntx = ntx; default_tunables.lnd_conns_per_peer = conns_per_peer; } From patchwork Wed Sep 26 02:48:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615155 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7D0E013A4 for ; Wed, 26 Sep 2018 02:49:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6D25B2A69D for ; Wed, 26 Sep 2018 02:49:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6090B2A86E; Wed, 26 Sep 2018 02:49:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C3BD52A69D for ; Wed, 26 Sep 2018 02:49:44 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 526C121FCE5; Tue, 25 Sep 2018 19:49:19 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 68A5421F4DC for ; Tue, 25 Sep 2018 19:48:27 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 67C0D1005399; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 656A9832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:14 -0400 Message-Id: <1537930097-11624-23-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 22/25] lustre: lnd: Don't Page Align remote_addr with FastReg X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Doug Oucharek Trying to page align the remote_addr for IB_RDMA_WRITE work requests is triggering "dump_cqe" errors from MOFED 4.x + mlx5. This patch removes the address masking we were doing with FastReg which was trying to page align remote_addr values. I am also removing the setting of "mr->iova" with FastReg as this is being done in the call to ib_map_mr_sg() and could cause problems. Signed-off-by: Doug Oucharek WC-bug-id: https://jira.whamcloud.com/browse/LU-9500 Reviewed-on: https://review.whamcloud.com/27149 Reviewed-by: Gu Zheng Reviewed-by: Dmitry Eremin Reviewed-by: Amir Shehata Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c | 6 +++--- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h | 2 +- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 6 ++++-- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c index 50c0c00..c207663 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c @@ -1651,7 +1651,7 @@ void kiblnd_fmr_pool_unmap(struct kib_fmr *fmr, int status) int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, struct kib_rdma_desc *rd, __u32 nob, __u64 iov, - struct kib_fmr *fmr) + struct kib_fmr *fmr, bool *is_fastreg) { __u64 *pages = tx->tx_pages; bool is_rx = (rd != tx->tx_rd); @@ -1671,6 +1671,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, if (fpo->fpo_is_fmr) { struct ib_pool_fmr *pfmr; + *is_fastreg = 0; spin_unlock(&fps->fps_lock); if (!tx_pages_mapped) { @@ -1690,6 +1691,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, } rc = PTR_ERR(pfmr); } else { + *is_fastreg = 1; if (!list_empty(&fpo->fast_reg.fpo_pool_list)) { struct kib_fast_reg_descriptor *frd; struct ib_reg_wr *wr; @@ -1727,8 +1729,6 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, return n < 0 ? n : -EINVAL; } - mr->iova = iov; - /* Prepare FastReg WR */ wr = &frd->frd_fastreg_wr; memset(wr, 0, sizeof(*wr)); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h index f21bdee..a4438d2 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h @@ -1003,7 +1003,7 @@ static inline unsigned int kiblnd_sg_dma_len(struct ib_device *dev, int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, struct kib_rdma_desc *rd, __u32 nob, __u64 iov, - struct kib_fmr *fmr); + struct kib_fmr *fmr, bool *is_fastreg); void kiblnd_fmr_pool_unmap(struct kib_fmr *fmr, int status); int kiblnd_tunables_setup(struct lnet_ni *ni); diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index db563c0..346d368 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -538,6 +538,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, { struct kib_hca_dev *hdev; struct kib_fmr_poolset *fps; + bool is_fastreg = 0; int cpt; int rc; @@ -548,7 +549,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, cpt = tx->tx_pool->tpo_pool.po_owner->ps_cpt; fps = net->ibn_fmr_ps[cpt]; - rc = kiblnd_fmr_pool_map(fps, tx, rd, nob, 0, &tx->fmr); + rc = kiblnd_fmr_pool_map(fps, tx, rd, nob, 0, &tx->fmr, &is_fastreg); if (rc) { CERROR("Can't map %u bytes: %d\n", nob, rc); return rc; @@ -559,7 +560,8 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, * who will need the rkey */ rd->rd_key = tx->fmr.fmr_key; - rd->rd_frags[0].rf_addr &= ~hdev->ibh_page_mask; + if (!is_fastreg) + rd->rd_frags[0].rf_addr &= ~hdev->ibh_page_mask; rd->rd_frags[0].rf_nob = nob; rd->rd_nfrags = 1; From patchwork Wed Sep 26 02:48:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615133 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B90613A4 for ; Wed, 26 Sep 2018 02:49:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B33A2A69D for ; Wed, 26 Sep 2018 02:49:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7FB8C2A86E; Wed, 26 Sep 2018 02:49:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 053442A866 for ; Wed, 26 Sep 2018 02:49:05 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id C6F7221F5A5; Tue, 25 Sep 2018 19:48:49 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B9FAD21F597 for ; Tue, 25 Sep 2018 19:48:27 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 6A039100539A; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 683C5833; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:15 -0400 Message-Id: <1537930097-11624-24-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 23/25] lustre: lnd: pending transmits dropped silently X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Amir Shehata list_add was being used erroneously. The logic should be to move the txs on ibp_tx_queue on a local list which is then processed. The code, however, did the reverse, which would result in the pending txs not processed and thus dropped silently. This in turn would lead to peers reference counts at the LNet layer not decremented since lnet_finalize() might not be called for a message. Initialize local list and use list_splice_init() to move transmits on the ibp_tx_queue to the local list. Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-10682 Reviewed-on: https://review.whamcloud.com/31374 Reviewed-by: Doug Oucharek Reviewed-by: Sonia Sharma Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index 346d368..f2a01eb 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -2150,8 +2150,8 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid, } /* grab pending txs while I have the lock */ - list_add(&txs, &peer_ni->ibp_tx_queue); - list_del_init(&peer_ni->ibp_tx_queue); + INIT_LIST_HEAD(&txs); + list_splice_init(&peer_ni->ibp_tx_queue, &txs); if (!kiblnd_peer_active(peer_ni) || /* peer_ni has been deleted */ conn->ibc_comms_error) { /* error has happened already */ From patchwork Wed Sep 26 02:48:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615135 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3A92E161F for ; Wed, 26 Sep 2018 02:49:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 29FF32A69D for ; Wed, 26 Sep 2018 02:49:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1E5212A86E; Wed, 26 Sep 2018 02:49:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A7EDE2A69D for ; Wed, 26 Sep 2018 02:49:08 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 7578321F53B; Tue, 25 Sep 2018 19:48:52 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 03E4E21F597 for ; Tue, 25 Sep 2018 19:48:28 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 6CA32100539B; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6B4E4832; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:16 -0400 Message-Id: <1537930097-11624-25-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 24/25] lustre: socklnd: propagate errors on send failure X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Olaf Weber When an attempt to send a message fails, for example because no connection could be established with the remote address, socklnd drops the message. For a PUT or REPLY message with non-zero payload, ksocknal_tx_done() calls lnet_finalize() with -EIO as the error code. But for an ACK or GET message there is no payload, and lnet_finalize() is called with 0 (no error) as the error code. This leaves upper layers to rely on other means to determine that sending the message did actually fail, and that (for example) no REPLY will ever answer a failed GET. Add an error code parameter to ksocknal_tx_done(). In ksocknal_txlist_done() change the 0/1 'error' indicator to be an actual error code that is passed on the ksocknal_tx_done(). Update the callers of ksocknal_txlist_done() to pass in the error code if they have encountered an error. Signed-off-by: Olaf Weber WC-bug-id: https://jira.whamcloud.com/browse/LU-9119 Reviewed-on: https://review.whamcloud.com/26691 Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c | 11 +++++++++-- drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h | 4 ++-- drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c | 15 +++++++-------- 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c index 1a49f5e..b2f0148 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.c @@ -607,7 +607,7 @@ struct ksock_peer * write_unlock_bh(&ksocknal_data.ksnd_global_lock); - ksocknal_txlist_done(ni, &zombies, 1); + ksocknal_txlist_done(ni, &zombies, -ENETDOWN); return rc; } @@ -1023,6 +1023,7 @@ struct ksock_peer * int cpt; struct ksock_tx *tx; struct ksock_tx *txtmp; + int rc2; int rc; int active; char *warn = NULL; @@ -1406,7 +1407,13 @@ struct ksock_peer * write_unlock_bh(global_lock); } - ksocknal_txlist_done(ni, &zombies, 1); + /* + * If we get here without an error code, just use -EALREADY. + * Depending on how we got here, the error may be positive + * or negative. Normalize the value for ksocknal_txlist_done(). + */ + rc2 = (rc == 0 ? -EALREADY : (rc > 0 ? -rc : rc)); + ksocknal_txlist_done(ni, &zombies, rc2); ksocknal_peer_decref(peer_ni); failed_1: diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h index 95ca2aa..82e3523 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd.h @@ -582,14 +582,14 @@ struct ksock_proto { } void ksocknal_tx_prep(struct ksock_conn *, struct ksock_tx *tx); -void ksocknal_tx_done(struct lnet_ni *ni, struct ksock_tx *tx); +void ksocknal_tx_done(struct lnet_ni *ni, struct ksock_tx *tx, int error); static inline void ksocknal_tx_decref(struct ksock_tx *tx) { LASSERT(atomic_read(&tx->tx_refcount) > 0); if (atomic_dec_and_test(&tx->tx_refcount)) - ksocknal_tx_done(NULL, tx); + ksocknal_tx_done(NULL, tx, 0); } static inline void diff --git a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c index 73321a7..dc9a129 100644 --- a/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/socklnd/socklnd_cb.c @@ -328,19 +328,18 @@ struct ksock_tx * } void -ksocknal_tx_done(struct lnet_ni *ni, struct ksock_tx *tx) +ksocknal_tx_done(struct lnet_ni *ni, struct ksock_tx *tx, int rc) { struct lnet_msg *lnetmsg = tx->tx_lnetmsg; - int rc = (!tx->tx_resid && !tx->tx_zc_aborted) ? 0 : -EIO; LASSERT(ni || tx->tx_conn); + if (!rc && (tx->tx_resid != 0 || tx->tx_zc_aborted)) + rc = -EIO; + if (tx->tx_conn) ksocknal_conn_decref(tx->tx_conn); - if (!ni && tx->tx_conn) - ni = tx->tx_conn->ksnc_peer->ksnp_ni; - ksocknal_free_tx(tx); if (lnetmsg) /* KSOCK_MSG_NOOP go without lnetmsg */ lnet_finalize(lnetmsg, rc); @@ -367,7 +366,7 @@ struct ksock_tx * list_del(&tx->tx_list); LASSERT(atomic_read(&tx->tx_refcount) == 1); - ksocknal_tx_done(ni, tx); + ksocknal_tx_done(ni, tx, error); } } @@ -1923,7 +1922,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) write_unlock_bh(&ksocknal_data.ksnd_global_lock); ksocknal_peer_failed(peer_ni); - ksocknal_txlist_done(peer_ni->ksnp_ni, &zombies, 1); + ksocknal_txlist_done(peer_ni->ksnp_ni, &zombies, rc); return 0; } @@ -2268,7 +2267,7 @@ void ksocknal_write_callback(struct ksock_conn *conn) write_unlock_bh(&ksocknal_data.ksnd_global_lock); - ksocknal_txlist_done(peer_ni->ksnp_ni, &stale_txs, 1); + ksocknal_txlist_done(peer_ni->ksnp_ni, &stale_txs, -ETIMEDOUT); } static int From patchwork Wed Sep 26 02:48:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 10615139 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E007D13A4 for ; Wed, 26 Sep 2018 02:49:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CD5E32A69D for ; Wed, 26 Sep 2018 02:49:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C1E742A86E; Wed, 26 Sep 2018 02:49:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 44EB82A866 for ; Wed, 26 Sep 2018 02:49:14 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6037821F55C; Tue, 25 Sep 2018 19:48:56 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5804721F4AA for ; Tue, 25 Sep 2018 19:48:28 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 6F746100539C; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6E149833; Tue, 25 Sep 2018 22:48:19 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 25 Sep 2018 22:48:17 -0400 Message-Id: <1537930097-11624-26-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> References: <1537930097-11624-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 25/25] lustre: ko2iblnd: allow for discontiguous fragments X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "John L. Hammond" , Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: "John L. Hammond" In the IOVEC case the buffers passed to the LND may not span complete pages, therefore the RDMA descriptor needs to describe all the buffers. Moreover for the FMR case, the addresses that get set in the RDMA descriptor need to be relative addresses. This issue was exposed after ko2iblnd was changed to handle the removal of ib_get_dma_mr() Fastreg still expects only one fragment with the total nob. Otherwise there is a dump_cqe error from MLX5 Signed-off-by: John L. Hammond Signed-off-by: Amir Shehata WC-bug-id: https://jira.whamcloud.com/browse/LU-9983 Reviewed-on: https://review.whamcloud.com/29290 Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c index f2a01eb..b16153f 100644 --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -541,6 +541,7 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, bool is_fastreg = 0; int cpt; int rc; + int i; LASSERT(tx->tx_pool); LASSERT(tx->tx_pool->tpo_pool.po_owner); @@ -560,10 +561,15 @@ static int kiblnd_init_rdma(struct kib_conn *conn, struct kib_tx *tx, int type, * who will need the rkey */ rd->rd_key = tx->fmr.fmr_key; - if (!is_fastreg) - rd->rd_frags[0].rf_addr &= ~hdev->ibh_page_mask; - rd->rd_frags[0].rf_nob = nob; - rd->rd_nfrags = 1; + if (!is_fastreg) { + for (i = 0; i < rd->rd_nfrags; i++) { + rd->rd_frags[i].rf_addr &= ~hdev->ibh_page_mask; + rd->rd_frags[i].rf_addr += i << hdev->ibh_page_shift; + } + } else { + rd->rd_frags[0].rf_nob = nob; + rd->rd_nfrags = 1; + } return 0; }