From patchwork Thu Feb 27 21:13:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11410335 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C533138D for ; Thu, 27 Feb 2020 21:35:20 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0511D24677 for ; Thu, 27 Feb 2020 21:35:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0511D24677 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id A45F534A0AF; Thu, 27 Feb 2020 13:29:40 -0800 (PST) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id E3DBB21FF19 for ; Thu, 27 Feb 2020 13:20:00 -0800 (PST) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 4012E8A84; Thu, 27 Feb 2020 16:18:17 -0500 (EST) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3E9A746D; Thu, 27 Feb 2020 16:18:17 -0500 (EST) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Thu, 27 Feb 2020 16:13:22 -0500 Message-Id: <1582838290-17243-335-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> References: <1582838290-17243-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 334/622] lnet: router sensitivity X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Introduce the router_sensitivity_percentage module parameter to control the sensitivity of routers to failures. It defaults to 100% which means a router interface needs to be fully healthy in order to be used. WC-bug-id: https://jira.whamcloud.com/browse/LU-11300 Lustre-commit: 2b59dae54efc ("LU-11300 lnet: router sensitivity") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/33449 Reviewed-by: Sebastien Buisson Reviewed-by: Chris Horn Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 1 + net/lnet/lnet/router.c | 50 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index 80f6f8c..eae55d5 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -505,6 +505,7 @@ struct lnet_ni * extern unsigned int lnet_recovery_interval; extern unsigned int lnet_peer_discovery_disabled; extern unsigned int lnet_drop_asym_route; +extern unsigned int router_sensitivity_percentage; extern int portal_rotor; int lnet_lib_init(void); diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c index 8374ce1..40725d2 100644 --- a/net/lnet/lnet/router.c +++ b/net/lnet/lnet/router.c @@ -90,6 +90,56 @@ module_param(router_ping_timeout, int, 0644); MODULE_PARM_DESC(router_ping_timeout, "Seconds to wait for the reply to a router health query"); +/* A value between 0 and 100. 0 meaning that even if router's interfaces + * have the worse health still consider the gateway usable. + * 100 means that at least one interface on the route's remote net is 100% + * healthy to consider the route alive. + * The default is set to 100 to ensure we maintain the original behavior. + */ +unsigned int router_sensitivity_percentage = 100; +static int rtr_sensitivity_set(const char *val, + const struct kernel_param *kp); +static struct kernel_param_ops param_ops_rtr_sensitivity = { + .set = rtr_sensitivity_set, + .get = param_get_int, +}; + +#define param_check_rtr_sensitivity(name, p) \ + __param_check(name, p, int) +module_param(router_sensitivity_percentage, rtr_sensitivity, 0644); +MODULE_PARM_DESC(router_sensitivity_percentage, + "How healthy a gateway should be to be used in percent"); + +static int +rtr_sensitivity_set(const char *val, const struct kernel_param *kp) +{ + int rc; + unsigned int *sen = (unsigned int *)kp->arg; + unsigned long value; + + rc = kstrtoul(val, 0, &value); + if (rc) { + CERROR("Invalid module parameter value for 'router_sensitivity_percentage'\n"); + return rc; + } + + if (value < 0 || value > 100) { + CERROR("Invalid value: %lu for 'router_sensitivity_percentage'\n", value); + return -EINVAL; + } + + /* The purpose of locking the api_mutex here is to ensure that + * the correct value ends up stored properly. + */ + mutex_lock(&the_lnet.ln_api_mutex); + + *sen = value; + + mutex_unlock(&the_lnet.ln_api_mutex); + + return 0; +} + int lnet_peers_start_down(void) {