From patchwork Sun Mar 20 13:30:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 12786514 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0019EC433EF for ; Sun, 20 Mar 2022 13:33:01 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id DB38221FAD7; Sun, 20 Mar 2022 06:32:09 -0700 (PDT) Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 67D6B21F3E8 for ; Sun, 20 Mar 2022 06:31:19 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 6DD62EFB; Sun, 20 Mar 2022 09:31:08 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 69D79D6A26; Sun, 20 Mar 2022 09:31:08 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 20 Mar 2022 09:30:50 -0400 Message-Id: <1647783064-20688-37-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1647783064-20688-1-git-send-email-jsimmons@infradead.org> References: <1647783064-20688-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 36/50] lnet: Avoid peer NI recovery for local interface X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn If a MR peer has a MR peer entry for itself (can happen if manually created or discovery is run on itself for some reason), then it is possible for it to put its own interfaces into peer recovery. Problems with local interfaces should be handled via local NI recovery. HPE-bug-id: LUS-10661 WC-bug-id: https://jira.whamcloud.com/browse/LU-15398 Lustre-commit: fb5d7036ec356c825 ("LU-15398 lnet: Avoid peer NI recovery for local interface") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/45933 Reviewed-by: Serguei Smirnov Reviewed-by: Andriy Skulysh Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 88f017b..f476975 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -877,6 +877,12 @@ if (!lnet_isrouter(lpni)) handle_remote_health = false; } + /* Do not put my interfaces into peer NI recovery. They should + * be handled with local NI recovery. + */ + if (handle_remote_health && lpni && + lnet_nid_to_ni_locked(&lpni->lpni_nid, 0)) + handle_remote_health = false; lnet_net_unlock(0); }