From patchwork Tue Aug 11 12:20:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11709183 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84B771392 for ; Tue, 11 Aug 2020 12:21:02 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6C79E206C3 for ; Tue, 11 Aug 2020 12:21:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6C79E206C3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id B40AF2F37A8; Tue, 11 Aug 2020 05:20:41 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id EE1CC2F366B for ; Tue, 11 Aug 2020 05:20:26 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 1848D1005641; Tue, 11 Aug 2020 08:20:21 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 171BC2B7; Tue, 11 Aug 2020 08:20:21 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Tue, 11 Aug 2020 08:20:11 -0400 Message-Id: <1597148419-20629-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1597148419-20629-1-git-send-email-jsimmons@infradead.org> References: <1597148419-20629-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/23] lnet: Preferred NI logic breaks MR routing X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn Edge (final-hop) routers typically use the non-multi-rail destination (NMR_DST) send case. i.e. they treat the destination as non-multi-rail. The reason for this is that we do not want routers to modify the destination peer interface selected by the message originator. As a result of using the NMR_DST send case, edge routers set a preferred NI, and then continue to use that NI, because it's preferred, even if the NI goes down and the router has other healthy interfaces available to it. Routers do not need to use the preferred NI selection logic when they are forwarding a message, so modify the NMR_DST algorithm to allow routers to select any suitable local NI. HPE-bug-id: LUS-9045 WC-bug-id: https://jira.whamcloud.com/browse/LU-13712 Lustre-commit: ef6c35877b96c ("LU-13712 lnet: Preferred NI logic breaks MR routing") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/39168 Reviewed-by: Serguei Smirnov Reviewed-by: Neil Brown Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-move.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c index aa6fe37..7c14518 100644 --- a/net/lnet/lnet/lib-move.c +++ b/net/lnet/lnet/lib-move.c @@ -2107,7 +2107,7 @@ struct lnet_ni * static int lnet_handle_any_local_nmr_dst(struct lnet_send_data *sd) { - int rc; + int rc = 0; /* sd->sd_best_lpni is already set to the final destination */ @@ -2122,7 +2122,23 @@ struct lnet_ni * return -EFAULT; } - rc = lnet_select_preferred_best_ni(sd); + if (sd->sd_msg->msg_routing) { + /* If I'm forwarding this message then I can choose any NI + * on the destination peer net + */ + sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL, + sd->sd_peer, + sd->sd_best_lpni->lpni_peer_net, + sd->sd_md_cpt, + true); + if (!sd->sd_best_ni) { + CERROR("Unable to forward message to %s. No local NI available\n", + libcfs_nid2str(sd->sd_dst_nid)); + rc = -EHOSTUNREACH; + } + } else { + rc = lnet_select_preferred_best_ni(sd); + } if (!rc) rc = lnet_handle_send(sd);