From patchwork Thu Feb 27 21:13:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410211 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7855692A for ; Thu, 27 Feb 2020 21:32:47 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 608AB24677 for ; Thu, 27 Feb 2020 21:32:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 608AB24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 29C2D349B3F; Thu, 27 Feb 2020 13:27:48 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 8F71921CB97 for ; Thu, 27 Feb 2020 13:19:58 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 29F298A5C; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 28802468; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:14 -0500 Message-Id: <1582838290-17243-327-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 326/622] lnet: Ensure md is detached when msg is not committed X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn It's possible for lnet_is_health_check() to return "true" when the message has not hit the network. In this situation the message is freed without detaching the MD. As a result, requests do not receive their unlink events and these requests are stuck forever. A little cleanup is included here: - The value of lnet_is_health_check() is only used in one place, so we don't need to save the result of it in a variable. - We don't need separate logic to detach the md when the send was successful. We'll fall through to the finalizing code after incrementing the health counters Cray-bug-id: LUS-7239 WC-bug-id: https://jira.whamcloud.com/browse/LU-12199 Lustre-commit: b65f3a1767ae ("LU-12199 lnet: Ensure md is detached when msg is not committed") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/34885 Reviewed-by: Olaf Weber Reviewed-by: Amir Shehata Signed-off-by: James Simmons --- net/lnet/lnet/lib-msg.c | 66 +++++++++++++++---------------------------------- 1 file changed, 20 insertions(+), 46 deletions(-) diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index ad35c3d..dbd8de4 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -784,16 +784,6 @@ msg->msg_md = NULL; } -static void -lnet_detach_md(struct lnet_msg *msg, int status) -{ - int cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); - - lnet_res_lock(cpt); - lnet_msg_detach_md(msg, cpt, status); - lnet_res_unlock(cpt); -} - static bool lnet_is_health_check(struct lnet_msg *msg) { @@ -881,7 +871,6 @@ int cpt; int rc; int i; - bool hc; LASSERT(!in_interrupt()); @@ -890,36 +879,7 @@ msg->msg_ev.status = status; - /* if the message is successfully sent, no need to keep the MD around */ - if (msg->msg_md && !status) - lnet_detach_md(msg, status); - -again: - hc = lnet_is_health_check(msg); - - /* the MD would've been detached from the message if it was - * successfully sent. However, if it wasn't successfully sent the - * MD would be around. And since we recalculate whether to - * health check or not, it's possible that we change our minds and - * we don't want to health check this message. In this case also - * free the MD. - * - * If the message is successful we're going to - * go through the lnet_health_check() function, but that'll just - * increment the appropriate health value and return. - */ - if (msg->msg_md && !hc) - lnet_detach_md(msg, status); - - rc = 0; - if (!msg->msg_tx_committed && !msg->msg_rx_committed) { - /* not committed to network yet */ - LASSERT(!msg->msg_onactivelist); - kfree(msg); - return; - } - - if (hc) { + if (lnet_is_health_check(msg)) { /* Check the health status of the message. If it has one * of the errors that we're supposed to handle, and it has * not timed out, then @@ -932,13 +892,26 @@ * put on the resend queue. */ if (!lnet_health_check(msg)) + /* Message is queued for resend */ return; + } - /* if we get here then we need to clean up the md because we're - * finalizing the message. - */ - if (msg->msg_md) - lnet_detach_md(msg, status); + /* We're not going to resend this message so detach its MD and invoke + * the appropriate callbacks + */ + if (msg->msg_md) { + cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie); + lnet_res_lock(cpt); + lnet_msg_detach_md(msg, cpt, status); + lnet_res_unlock(cpt); + } + +again: + if (!msg->msg_tx_committed && !msg->msg_rx_committed) { + /* not committed to network yet */ + LASSERT(!msg->msg_onactivelist); + kfree(msg); + return; } /* @@ -972,6 +945,7 @@ container->msc_finalizers[my_slot] = current; + rc = 0; while ((msg = list_first_entry_or_null(&container->msc_finalizing, struct lnet_msg, msg_list)) != NULL) {