Message ID | 20110720161655.1ed38052.weiny2@llnl.gov (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Ira Weiny |
Headers | show |
On Wed, Jul 20, 2011 at 04:16:55PM -0700, Ira Weiny wrote: > > The old algorithm would attempt to extend a DR path through the CA and fail. > The new algorithm retracts the path properly. > > This does not work when combined routing is used with the CA as a starting > point. :-( The ibtool version of this switches the starting LID with the connected switch in this case.. Having that general functionality means you can easially write a traceroute routine with this common code. Note, doing tricks like this with the DR path screws up the localPortNum value, so if you ever read Port/NodeInfo using a path that has been manipulated like this it needs a fixup step. Also, there is a bug, you can't DLID route to the local HCA and then use that as a source of a DR path, even though intuitively that should work. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Jul 2011 16:34:51 -0700 Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Wed, Jul 20, 2011 at 04:16:55PM -0700, Ira Weiny wrote: > > > > The old algorithm would attempt to extend a DR path through the CA and fail. > > The new algorithm retracts the path properly. > > > > This does not work when combined routing is used with the CA as a starting > > point. :-( > > The ibtool version of this switches the starting LID with the > connected switch in this case.. Given a LID (resolved from a PR query of a GUID) for a CA port, how do you know the connected switch? Via sa query? > Having that general functionality > means you can easially write a traceroute routine with this common > code. Yes that would be nice. But again requires more sa queries. > > Note, doing tricks like this with the DR path screws up the > localPortNum value, so if you ever read Port/NodeInfo using a path > that has been manipulated like this it needs a fixup step. Not sure I follow you. To be clear this patch alters the path like this: Given DR path 0,1,3,4, where 0,1,3,4 ends at a CA "retract" the path to 0,1,3 and query that node as well as 0,1,3,4. Resolve the port linkage and return that "fabric" to the user. Given a combined route DLID+path the CA is resolved and returned. There is, as you say below, no way to resolve (via SMP queries) the connected node without scanning the entire fabric and resolving the connection. > > Also, there is a bug, you can't DLID route to the local HCA and then > use that as a source of a DR path, even though intuitively that should > work. A bug in the code? Ira > > Jason
On Wed, Jul 20, 2011 at 04:53:36PM -0700, Ira Weiny wrote: > > The ibtool version of this switches the starting LID with the > > connected switch in this case.. > > Given a LID (resolved from a PR query of a GUID) for a CA port, how > do you know the connected switch? Via sa query? Yes, there is also a general question when and if various tools should use SMPs vs use the SA... > > Note, doing tricks like this with the DR path screws up the > > localPortNum value, so if you ever read Port/NodeInfo using a path > > that has been manipulated like this it needs a fixup step. > > Not sure I follow you. To be clear this patch alters the path like this: > > Given DR path 0,1,3,4, where 0,1,3,4 ends at a CA > > "retract" the path to 0,1,3 and query that node as well as 0,1,3,4. > Resolve the port linkage and return that "fabric" to the user. If you issue a NodeRecord SMP to DR path 0,1,3,4 you will get localPortNum of 1. If you then algorithmically want to construct a DR path of 0,1,3,4,1 to reach the device connected to the CA port your algorithm will adjust it to 0,1,3, but a NodeRecord SMP will not return a localPortNum of 3, which would be expected from the construction of the original path. It depends on how your other algorithms are using this feature, but, eg for a tracing application that wants to print each hop along the path, messing up the localPortNum during the adjustment is bad news. > Given a combined route DLID+path the CA is resolved and returned. > There is, as you say below, no way to resolve (via SMP queries) the > connected node without scanning the entire fabric and resolving the > connection. You trace through the LFT from the local end port to the target CA DLID and that gives you a path to the switch. > > Also, there is a bug, you can't DLID route to the local HCA and then > > use that as a source of a DR path, even though intuitively that should > > work. > > A bug in the code? In Linux or the HCA I think. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 7/20/2011 7:34 PM, Jason Gunthorpe wrote: > Also, there is a bug, you can't DLID route to the local HCA and then > use that as a source of a DR path, even though intuitively that should > work. Just the local xCA or any xCA ? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 22, 2011 at 09:21:13AM -0400, Hal Rosenstock wrote: > On 7/20/2011 7:34 PM, Jason Gunthorpe wrote: > > Also, there is a bug, you can't DLID route to the local HCA and then > > use that as a source of a DR path, even though intuitively that should > > work. > > Just the local xCA or any xCA ? Well both, but I was under the impression that latter was specified behavior, but I haven't looked it up. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 22 Jul 2011 10:29:46 -0700 Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Fri, Jul 22, 2011 at 09:21:13AM -0400, Hal Rosenstock wrote: > > On 7/20/2011 7:34 PM, Jason Gunthorpe wrote: > > > Also, there is a bug, you can't DLID route to the local HCA and then > > > use that as a source of a DR path, even though intuitively that should > > > work. > > > > Just the local xCA or any xCA ? > > Well both, but I was under the impression that latter was specified > behavior, but I haven't looked it up. I agree it would be nice if one could issue a combined path and "scan through" any xCA on the fabric. However, I don't think it is allowed, nor correct to do so. The informative text on pg 805 ln 23 specifically states that the initial LID routing is from "source node to source switch". Furthermore, C14-7 (continued on pg 809, first bullet) specifically indicates that the DLID for an initial LID routed DR SMP "shall be set to the LID of the source switch in the DR part." ^^^^^^ Even if this is considered vague there would be a problem a xCA was allowed. C14-9 "case 1" would allow an SM to "scan through" a xCA to another subnet. Or actually it would result in the SMP being "lost" in the other subnet. For example take the following: "Local subnet" "Other subnet" xCA1 ----> switch ----> xCA2 p1 : xCA2 p2 ----> switch -> ... LIDs 2 3 4 1 2 If I issue the SMP from xCA1 with DrSLID = 3 and a DR path of 0,2. xCA2 will alter the fields as per C14-9 and issue the SMP out port 2. Then, when the response comes back to xCA2 p2, it will attempt to send the packet to LID 2 (DrSLID == 2 from the original query) and the SMP will end up at the switch on the right in "other subnet", _not_ at xCA1 So although I don't see a specific compliance (and I agree it would be nice) I think it is safe to say that it should be restricted. Would it be more clear for C14-9 to specifically exclude this case? Ira > > Jason
On 7/22/2011 1:29 PM, Jason Gunthorpe wrote: > On Fri, Jul 22, 2011 at 09:21:13AM -0400, Hal Rosenstock wrote: >> On 7/20/2011 7:34 PM, Jason Gunthorpe wrote: >>> Also, there is a bug, you can't DLID route to the local HCA and then >>> use that as a source of a DR path, even though intuitively that should >>> work. >> >> Just the local xCA or any xCA ? > > Well both, That's what I wanted to be sure of. > but I was under the impression that latter was specified > behavior, but I haven't looked it up. The spec doesn't support this. -- Hal > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 22, 2011 at 03:00:49PM -0700, Ira Weiny wrote: > I agree it would be nice if one could issue a combined path and "scan through" > any xCA on the fabric. However, I don't think it is allowed, nor correct to > do so. Well, it is explicitly not allowed: 2) If Hop Count is non zero and Hop Pointer is in the range 1 Hop Pointer < Hop Count, this SMI is an intermediate hop in the directed route portion of the path. If the node is not a switch, the SMI shall discard the SMP, otherwise the SMI shall alter the contents of the directed route SMP as follows: > Even if this is considered vague there would be a problem a xCA was allowed. That sort of behavior is explicitly disallowed in other cases, like reading port info. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 22 Jul 2011 15:13:49 -0700 Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Fri, Jul 22, 2011 at 03:00:49PM -0700, Ira Weiny wrote: > > > I agree it would be nice if one could issue a combined path and "scan through" > > any xCA on the fabric. However, I don't think it is allowed, nor correct to > > do so. > > Well, it is explicitly not allowed: > > 2) If Hop Count is non zero and Hop Pointer is in the range 1 Hop > Pointer < Hop Count, this SMI is an intermediate hop in the directed > route portion of the path. If the node is not a switch, the SMI shall > discard the SMP, otherwise the SMI shall alter the contents of the > directed route SMP as follows: I guess I was confused because in the example I gave the Hop Count would be zero so the above did not apply. That is why I went off looking for other explanations. Regardless I think we are all in agreement. :-D Ira > > > Even if this is considered vague there would be a problem a xCA was allowed. > > That sort of behavior is explicitly disallowed in other cases, like > reading port info. > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/libibnetdisc/src/ibnetdisc.c b/libibnetdisc/src/ibnetdisc.c index b7fa75a..07f2a3f 100644 --- a/libibnetdisc/src/ibnetdisc.c +++ b/libibnetdisc/src/ibnetdisc.c @@ -56,8 +56,13 @@ #include "chassis.h" /* forward declare */ +struct ni_cbdata +{ + ibnd_node_t *node; + int port_num; +}; static int query_node_info(smp_engine_t * engine, ib_portid_t * portid, - ibnd_node_t * node); + struct ni_cbdata * cbdata); static int recv_switch_info(smp_engine_t * engine, ibnd_smp_t * smp, uint8_t * mad, void *cb_data) @@ -87,6 +92,26 @@ static int add_port_to_dpath(ib_dr_path_t * path, int nextport) return path->cnt; } +static int retract_dpath(smp_engine_t * engine, ib_portid_t * portid) +{ + ibnd_scan_t *scan = engine->user_data; + ibnd_fabric_t *fabric = scan->fabric; + + if (scan->cfg->max_hops && + fabric->maxhops_discovered > scan->cfg->max_hops) + return 0; + + /* this may seem wrong but the only time we would retract the path is + * if the user specified a CA for the DR path and we are retracting + * from that to find the node it is connected to. This counts as a + * positive hop discovered + */ + fabric->maxhops_discovered++; + portid->drpath.p[portid->drpath.cnt] = 0; + portid->drpath.cnt--; + return 1; +} + static int extend_dpath(smp_engine_t * engine, ib_portid_t * portid, int nextport) { @@ -115,8 +140,9 @@ static int extend_dpath(smp_engine_t * engine, ib_portid_t * portid, return -1; } - if ((unsigned) portid->drpath.cnt > fabric->maxhops_discovered) - fabric->maxhops_discovered = portid->drpath.cnt; + if (((unsigned) portid->drpath.cnt - scan->initial_hops) > + fabric->maxhops_discovered) + fabric->maxhops_discovered++; return 1; } @@ -202,9 +228,26 @@ static int recv_port_info(smp_engine_t * engine, ibnd_smp_t * smp, == IB_PORT_PHYS_STATE_LINKUP && ((node->type == IB_NODE_SWITCH && port_num != local_port) || (node == fabric->from_node && port_num == fabric->from_portnum))) { + + int rc = 0; ib_portid_t path = smp->path; - if (extend_dpath(engine, &path, port_num) > 0) - query_node_info(engine, &path, node); + + if (node->type != IB_NODE_SWITCH && + node == fabric->from_node && + path.drpath.cnt > 1) + rc = retract_dpath(engine, &path); + else { + /* we can't proceed through an HCA with DR */ + if (path.lid == 0 || node->type == IB_NODE_SWITCH) + rc = extend_dpath(engine, &path, port_num); + } + + if (rc > 0) { + struct ni_cbdata * cbdata = malloc(sizeof(*cbdata)); + cbdata->node = node; + cbdata->port_num = port_num; + query_node_info(engine, &path, cbdata); + } } return 0; @@ -255,11 +298,6 @@ static ibnd_node_t *create_node(smp_engine_t * engine, ib_portid_t * path, return rc; } -static int get_last_port(ib_portid_t * path) -{ - return path->drpath.p[path->drpath.cnt]; -} - static void link_ports(ibnd_node_t * node, ibnd_port_t * port, ibnd_node_t * remotenode, ibnd_port_t * remoteport) { @@ -294,7 +332,9 @@ static int recv_node_info(smp_engine_t * engine, ibnd_smp_t * smp, ibnd_fabric_t *fabric = scan->fabric; int i = 0; uint8_t *node_info = mad + IB_SMP_DATA_OFFS; - ibnd_node_t *rem_node = cb_data; + struct ni_cbdata *ni_cbdata = (struct ni_cbdata *)cb_data; + ibnd_node_t *rem_node = NULL; + int rem_port_num = 0; ibnd_node_t *node; int node_is_new = 0; uint64_t node_guid = mad_get_field64(node_info, 0, IB_NODE_GUID_F); @@ -302,6 +342,12 @@ static int recv_node_info(smp_engine_t * engine, ibnd_smp_t * smp, int port_num = mad_get_field(node_info, 0, IB_NODE_LOCAL_PORT_F); ibnd_port_t *port = NULL; + if (ni_cbdata) { + rem_node = ni_cbdata->node; + rem_port_num = ni_cbdata->port_num; + free(ni_cbdata); + } + node = ibnd_find_node_guid(fabric, node_guid); if (!node) { node = create_node(engine, &smp->path, node_info); @@ -333,8 +379,6 @@ static int recv_node_info(smp_engine_t * engine, ibnd_smp_t * smp, fabric->from_portnum = port_num; } else { /* link ports... */ - int rem_port_num = get_last_port(&smp->path); - if (!rem_node->ports[rem_port_num]) { IBND_ERROR("Internal Error; " "Node(%p) 0x%" PRIx64 @@ -363,11 +407,11 @@ static int recv_node_info(smp_engine_t * engine, ibnd_smp_t * smp, } static int query_node_info(smp_engine_t * engine, ib_portid_t * portid, - ibnd_node_t * node) + struct ni_cbdata * cbdata) { IBND_DEBUG("Query Node Info; %s\n", portid2str(portid)); return issue_smp(engine, portid, IB_ATTR_NODE_INFO, 0, - recv_node_info, node); + recv_node_info, (void *)cbdata); } ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t * fabric, uint64_t guid) @@ -430,8 +474,7 @@ void add_to_type_list(ibnd_node_t * node, ibnd_fabric_t * fabric) } } -static int set_config(struct ibnd_config *config, struct ibnd_config *cfg, - int initial_hops) +static int set_config(struct ibnd_config *config, struct ibnd_config *cfg) { if (!config) return (-EINVAL); @@ -445,8 +488,6 @@ static int set_config(struct ibnd_config *config, struct ibnd_config *cfg, config->timeout_ms = DEFAULT_TIMEOUT; if (!config->retries) config->retries = DEFAULT_RETRIES; - if (config->max_hops) - config->max_hops += initial_hops; return (0); } @@ -467,7 +508,7 @@ ibnd_fabric_t *ibnd_discover_fabric(char * ca_name, int ca_port, if (!from) from = &my_portid; - if (set_config(&config, cfg, from->drpath.cnt)) { + if (set_config(&config, cfg)) { IBND_ERROR("Invalid ibnd_config\n"); return NULL; } @@ -483,6 +524,7 @@ ibnd_fabric_t *ibnd_discover_fabric(char * ca_name, int ca_port, memset(&scan.selfportid, 0, sizeof(scan.selfportid)); scan.fabric = fabric; scan.cfg = &config; + scan.initial_hops = from->drpath.cnt; if (smp_engine_init(&engine, ca_name, ca_port, &scan, &config)) { free(fabric); @@ -505,6 +547,7 @@ ibnd_fabric_t *ibnd_discover_fabric(char * ca_name, int ca_port, goto error; fabric->total_mads_used = engine.total_smps; + fabric->maxhops_discovered += scan.initial_hops; if (group_nodes(fabric)) goto error; diff --git a/libibnetdisc/src/internal.h b/libibnetdisc/src/internal.h index 3c599ec..80918c4 100644 --- a/libibnetdisc/src/internal.h +++ b/libibnetdisc/src/internal.h @@ -62,6 +62,7 @@ typedef struct ibnd_scan { ibnd_fabric_t *fabric; struct ibnd_config *cfg; struct ibmad_port *ibmad_port; + unsigned initial_hops; } ibnd_scan_t; typedef struct ibnd_smp ibnd_smp_t;
The old algorithm would attempt to extend a DR path through the CA and fail. The new algorithm retracts the path properly. This does not work when combined routing is used with the CA as a starting point. :-( Signed-off-by: Ira Weiny <weiny2@llnl.gov> --- libibnetdisc/src/ibnetdisc.c | 83 ++++++++++++++++++++++++++++++++---------- libibnetdisc/src/internal.h | 1 + 2 files changed, 64 insertions(+), 20 deletions(-)