From patchwork Thu Feb 27 21:09:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11409789 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB713159A for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A43D5246A1 for ; Thu, 27 Feb 2020 21:22:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A43D5246A1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 70E10348845; Thu, 27 Feb 2020 13:20:52 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 6C43B21FA7D for ; Thu, 27 Feb 2020 13:18:41 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 37816EF3; Thu, 27 Feb 2020 16:18:14 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 35DA546F; Thu, 27 Feb 2020 16:18:14 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:09:11 -0500 Message-Id: <1582838290-17243-84-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 083/622] lnet: add retry count X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Added a module parameter to define the number of retries on a message. It defaults to 0, which means no retries will be attempted. Each message will keep track of the number of times it has been retransmitted. When queuing it on the resend queue, the retry count will be checked and if it's exceeded, then the message will be finalized. WC-bug-id: https://jira.whamcloud.com/browse/LU-9120 Lustre-commit: 20e23980eae2 ("LU-9120 lnet: add retry count") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/32769 Reviewed-by: Sonia Sharma Reviewed-by: Olaf Weber Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + include/linux/lnet/lib-types.h | 2 ++ net/lnet/lnet/api-ni.c | 5 +++++ net/lnet/lnet/lib-msg.c | 8 +++++++- 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index b8ca114..ace0d51 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -478,6 +478,7 @@ struct lnet_ni * struct lnet_net *lnet_get_net_locked(u32 net_id); extern unsigned int lnet_transaction_timeout; +extern unsigned int lnet_retry_count; extern unsigned int lnet_numa_range; extern unsigned int lnet_health_sensitivity; extern unsigned int lnet_peer_discovery_disabled; diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h index 19b83a4..1108e3b 100644 --- a/include/linux/lnet/lib-types.h +++ b/include/linux/lnet/lib-types.h @@ -103,6 +103,8 @@ struct lnet_msg { enum lnet_msg_hstatus msg_health_status; /* This is a recovery message */ bool msg_recovery; + /* the number of times a transmission has been retried */ + int msg_retry_count; /* flag to indicate that we do not want to resend this message */ bool msg_no_resend; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 97d9be5..a54fe2c 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -116,6 +116,11 @@ struct lnet the_lnet = { MODULE_PARM_DESC(lnet_transaction_timeout, "Time in seconds to wait for a REPLY or an ACK"); +unsigned int lnet_retry_count; +module_param(lnet_retry_count, uint, 0444); +MODULE_PARM_DESC(lnet_retry_count, + "Maximum number of times to retry transmitting a message"); + /* * This sequence number keeps track of how many times DLC was used to * update the local NIs. It is incremented when a NI is added or diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c index 046923b..9841e14 100644 --- a/net/lnet/lnet/lib-msg.c +++ b/net/lnet/lnet/lib-msg.c @@ -556,7 +556,8 @@ } /* Do a health check on the message: - * return -1 if we're not going to handle the error + * return -1 if we're not going to handle the error or + * if we've reached the maximum number of retries. * success case will return -1 as well * return 0 if it the message is requeued for send */ @@ -646,6 +647,11 @@ if (msg->msg_no_resend) return -1; + /* check if the message has exceeded the number of retries */ + if (msg->msg_retry_count >= lnet_retry_count) + return -1; + msg->msg_retry_count++; + lnet_net_lock(msg->msg_tx_cpt); /* remove message from the active list and reset it in preparation