diff mbox series

[14/15] lnet: don't use hops to determine the route state

Message ID 1636384063-13838-15-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: update to OpenSFS tree Nov 8, 2021 | expand

Commit Message

James Simmons Nov. 8, 2021, 3:07 p.m. UTC
From: Serguei Smirnov <ssmirnov@whamcloud.com>

NodeA <-tcp1-> GW1 <-tcp2-> GW2 <-tcp3-> NodeB

Assuming GW1 knows how to reach tcp3 network and GW2 knows
how to reach tcp1 network, it should be possible to add routes
without specifying hop=2 on nodes A and B to reach tcp3 and tcp1
respectively and then be able to lnetctl ping between them.
Changes introduced by LU-13785 interpret default hops to be
equivalent to hop=1 set explicitly for the purpose of determining
route aliveness, which results in the routes created as described
above to be considered "down".

Fix it so that default hop setting doesn't prevent
the multi-hop scenario from working.

Fixes: 64d703ca18 ("lnet: Use lr_hops for avoid_asym_router_failure")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14945
Lustre-commit: 3f2844dc9333c8645 ("LU-14945 lnet: don't use hops to determine the route state")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44674
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index 7ce33eb..97e5ab2 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -318,7 +318,7 @@  bool lnet_is_route_alive(struct lnet_route *route)
 	 * routes the next-hop will not have the remote net.
 	 */
 	if (avoid_asym_router_failure &&
-	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS)) {
+	    (route->lr_hops == 1 || route->lr_single_hop)) {
 		rlpn = lnet_peer_get_net_locked(gw, route->lr_net);
 		if (!rlpn)
 			return false;
@@ -470,8 +470,7 @@  bool lnet_is_route_alive(struct lnet_route *route)
 
 		route->lr_single_hop = single_hop;
 		if (avoid_asym_router_failure &&
-		    (route->lr_hops == 1 ||
-		     route->lr_hops == LNET_UNDEFINED_HOPS))
+		    (route->lr_hops == 1 || route->lr_single_hop))
 			lnet_set_route_aliveness(route, net_up);
 		else
 			lnet_set_route_aliveness(route, true);
@@ -764,6 +763,14 @@  static void lnet_shuffle_seed(void)
 	lnet_peer_ni_decref_locked(lpni);
 	lnet_net_unlock(LNET_LOCK_EX);
 
+	/* If avoid_asym_router_failure is enabled and hop count is not
+	 * set to 1 for a route that is actually single-hop, then the
+	 * feature will fail to prevent the router from being selected
+	 * if it is missing a NI on the remote network due to misconfiguration.
+	 */
+	if (avoid_asym_router_failure && hops == LNET_UNDEFINED_HOPS)
+		CWARN("Use hops = 1 for a single-hop route when avoid_asym_router_failure feature is enabled\n");
+
 	rc = 0;
 
 	if (!add_route) {