diff mbox

opensm/osm_console.c: Support portstatus output for incorrect link speeds

Message ID 1361489963.24405.68.camel@auk59.llnl.gov (mailing list archive)
State Changes Requested
Delegated to: Hal Rosenstock
Headers show

Commit Message

Al Chu Feb. 21, 2013, 11:39 p.m. UTC
Usually coinciding with use of the force_link_speed configuration option,
some link speeds can be observed to be incorrect.  For example, a link
may be active at QDR even if SDR & DDR speeds are the only ones enabled.

Under these scenarios, the portstatus may incorrectly output port status,
confusing users.

A new port output status of "incorrect speed" differentiates ports in this
state from other states, so as to not confuse users.

Signed-off-by: Albert Chu <chu11@llnl.gov>
---
 opensm/osm_console.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

Comments

Hal Rosenstock Feb. 22, 2013, 7:33 p.m. UTC | #1
Hi Al,

On 2/21/2013 6:39 PM, Albert Chu wrote:
> Usually coinciding with use of the force_link_speed configuration option,
> some link speeds can be observed to be incorrect.  For example, a link
> may be active at QDR even if SDR & DDR speeds are the only ones enabled.

Are you referring to the fact that after setting the enabled speed, the
IB spec does not mandate a link renegotiation and hence one may not have
occurred so the enabled speed can be out of whack with the active speed ?

Also, isn't there a similar issue with extended speeds (and
force_link_speed_ext) ? If so, should that also be part of this patch ?

Same issue also is there for link width as it works the same way as link
speed. Maybe that too is yet another patch.

What do you think ?

-- Hal

> Under these scenarios, the portstatus may incorrectly output port status,
> confusing users.
> 
> A new port output status of "incorrect speed" differentiates ports in this
> state from other states, so as to not confuse users.
> 
> Signed-off-by: Albert Chu <chu11@llnl.gov>

<snip...>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Chu Feb. 22, 2013, 11:20 p.m. UTC | #2
On Fri, 2013-02-22 at 14:33 -0500, Hal Rosenstock wrote:
> Hi Al,
> 
> On 2/21/2013 6:39 PM, Albert Chu wrote:
> > Usually coinciding with use of the force_link_speed configuration option,
> > some link speeds can be observed to be incorrect.  For example, a link
> > may be active at QDR even if SDR & DDR speeds are the only ones enabled.
> 
> Are you referring to the fact that after setting the enabled speed, the
> IB spec does not mandate a link renegotiation and hence one may not have
> occurred so the enabled speed can be out of whack with the active speed ?

Now that I re-read my patch comment, it wasn't the clearest.

What you describe above very well could have occurred or maybe it was
just bad hardware.  The key was that ports were running at speeds (e.g.
QDR) that were not enabled (e.g. only SDR & DDR) and the current opensm
console was reporting invalid/incorrect error messages.

> Also, isn't there a similar issue with extended speeds (and
> force_link_speed_ext) ? If so, should that also be part of this patch ?
>
> Same issue also is there for link width as it works the same way as link
> speed. Maybe that too is yet another patch.
>
> What do you think ?

Yeah, I should have done those as well.  I was only solving the one
particular bug found.  I'll rework the patch.

Al

> -- Hal
> 
> > Under these scenarios, the portstatus may incorrectly output port status,
> > confusing users.
> > 
> > A new port output status of "incorrect speed" differentiates ports in this
> > state from other states, so as to not confuse users.
> > 
> > Signed-off-by: Albert Chu <chu11@llnl.gov>
> 
> <snip...>
diff mbox

Patch

diff --git a/opensm/osm_console.c b/opensm/osm_console.c
index 600007c..06c9360 100644
--- a/opensm/osm_console.c
+++ b/opensm/osm_console.c
@@ -712,6 +712,8 @@  typedef struct {
 	uint64_t ports_fdr;
 	uint64_t ports_edr;
 	uint64_t ports_unknown_speed;
+	uint64_t ports_incorrect_speed;
+	port_report_t *incorrect_speed_ports;
 	uint64_t ports_reduced_speed;
 	port_report_t *reduced_speed_ports;
 } fabric_stats_t;
@@ -773,7 +775,14 @@  static void __get_stats(cl_map_item_t * const p_map_item, void *context)
 			fs->ports_reduced_width++;
 		}
 
-		if ((enabled_speed ^ active_speed) > active_speed) {
+		/* incorrect speed usually due to problems with force_link_speed */
+		if (!(active_speed & enabled_speed)) {
+			__tag_port_report(&(fs->incorrect_speed_ports),
+					  cl_ntoh64(node->node_info.node_guid),
+					  port, node->print_desc);
+			fs->ports_incorrect_speed++;
+		}
+		else if ((enabled_speed ^ active_speed) > active_speed) {
 			__tag_port_report(&(fs->reduced_speed_ports),
 					  cl_ntoh64(node->node_info.node_guid),
 					  port, node->print_desc);
@@ -933,13 +942,18 @@  static void portstatus_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)
 		fprintf(out, "   %" PRIu64 " at 25.78125 Gbps\n", fs.ports_edr);
 
 	if (fs.ports_disabled + fs.ports_reduced_speed + fs.ports_reduced_width
-	    > 0) {
+	    + fs.ports_incorrect_speed > 0) {
 		fprintf(out, "\nPossible issues:\n");
 	}
 	if (fs.ports_disabled) {
 		fprintf(out, "   %" PRIu64 " disabled\n", fs.ports_disabled);
 		__print_port_report(out, fs.disabled_ports);
 	}
+	if (fs.ports_incorrect_speed) {
+		fprintf(out, "   %" PRIu64 " with incorrect speed\n",
+			fs.ports_incorrect_speed);
+		__print_port_report(out, fs.incorrect_speed_ports);
+	}
 	if (fs.ports_reduced_speed) {
 		fprintf(out, "   %" PRIu64 " with reduced speed\n",
 			fs.ports_reduced_speed);