diff mbox series

[16/24] lnet: Skip router discovery on send path

Message ID 1642124283-10148-17-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: update to OpenSFS Jan 13, 2022 | expand

Commit Message

James Simmons Jan. 14, 2022, 1:37 a.m. UTC
From: Chris Horn <chris.horn@hpe.com>

When the router checker is enabled, routes are regularly marked as out
of date w.r.t. discovery. This can cause upper level messages to be
delayed while the router undergoes discovery. We can avoid delaying
messages by relying on the router checker to initiate discovery of
routers. If we happen to send a message to a router before it has
been discovered then the worst case scenario is that the route is
actually down or we end up utilizing a subset of a multi-rail router's
interfaces. Both situations can be remedied by utilizing the
check_routers_before_use parameter.

Change the logic in lnet_handle_find_routed_path() so that we only
initiate discovery if the alive_router_check_interval is <= 0 (i.e.
router checker pings are disabled).

WC-bug-id: https://jira.whamcloud.com/browse/LU-15275
Lustre-commit: c8e74c395d5634dbb ("LU-15275 lnet: Skip router discovery on send path")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/45684
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)
diff mbox series

Patch

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 133397e..8d4fd4d 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -2104,13 +2104,23 @@  struct lnet_ni *
 		LASSERT(gw == gwni->lpni_peer_net->lpn_peer);
 	}
 
-	/* Discover this gateway if it hasn't already been discovered.
-	 * This means we might delay the message until discovery has
-	 * completed
+	/* If the router checker is not active then discover the gateway here.
+	 * This ensures we are able to take advantage of multi-rail routing, but
+	 * if the router checker is active then we do not unecessarily delay
+	 * messages while the gateway is being checked by the dedicated monitor
+	 * thread.
+	 *
+	 * NB: We're only checking the alive_router_check_interval here, rather
+	 * than calling lnet_router_checker_active(), because the other
+	 * conditions that are checked by that function are either
+	 * irrelevant (the_lnet.ln_routing) or must be true (list of routers
+	 * is not empty)
 	 */
-	rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_cpt);
-	if (rc)
-		return rc;
+	if (alive_router_check_interval <= 0) {
+		rc = lnet_initiate_peer_discovery(gwni, sd->sd_msg, sd->sd_cpt);
+		if (rc)
+			return rc;
+	}
 
 	if (!sd->sd_best_ni) {
 		lpn = gwni->lpni_peer_net;