Message ID | 51CB5BF1.1090601@nasa.gov (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Hal Rosenstock |
Headers | show |
HI Jeff, On 6/26/2013 5:24 PM, Jeff Becker wrote: > Hi Hal. At the OFA workshop, I mentioned that I've been working on some > modifications to opensm that we use at NASA. Following extensive testing > of these applied to opensm 3.3.13 (the version we run here), I have > ported these to top of tree opensm, and have tested them on a small > cluster. Thanks for getting this done! For future reference, patches should be sent as plain text as this makes it easier to comment. > The first patch modifies the console logflush command to take "on" or > "off" as an argument for toggling. Thanks. Applied. > The second (more extensive) patch > adds a command line option to specify a file in which each line contains > a switch GUID/port pair to be ignored by opensm. The idea is to specify > this file when you start opensm (it can be empty), and add ports to > ignore (one per line for each end of a connection) to the file. At the > next heavy sweep (or HUP) the sm will reprogram the forwarding tables > without including the ignored links. We use this for replacing cables, > as well as for system expansion (adding new racks). I'll comment on this one later. -- Hal > Please let me know if you have any questions/issues with these. Thanks. > > -jeff -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Hal, I have some testing info about the second patch below. On 07/03/2013 03:23 AM, Hal Rosenstock wrote: > HI Jeff, > > On 6/26/2013 5:24 PM, Jeff Becker wrote: >> Hi Hal. At the OFA workshop, I mentioned that I've been working on some >> modifications to opensm that we use at NASA. Following extensive testing >> of these applied to opensm 3.3.13 (the version we run here), I have >> ported these to top of tree opensm, and have tested them on a small >> cluster. > Thanks for getting this done! For future reference, patches should be > sent as plain text as this makes it easier to comment. OK. So I just send the output of git-format-patch directly? It appears to be formatted properly. > >> The first patch modifies the console logflush command to take "on" or >> "off" as an argument for toggling. > Thanks. Applied. > >> The second (more extensive) patch >> adds a command line option to specify a file in which each line contains >> a switch GUID/port pair to be ignored by opensm. The idea is to specify >> this file when you start opensm (it can be empty), and add ports to >> ignore (one per line for each end of a connection) to the file. At the >> next heavy sweep (or HUP) the sm will reprogram the forwarding tables >> without including the ignored links. We use this for replacing cables, >> as well as for system expansion (adding new racks). > I'll comment on this one later. Dale (cc'd) did some testing with my patch on Pleiades in preparation for a system augmentation (new racks) happening soon. He found that the SM correctly produces routes that do not use links marked to be ignored, but when you then remove or disable the links, the SM re-routes the fabric anyway and comes up with different routes than before. This rerouting causes problems with existing connections. There also appears to be a bookkeeping problem such that some of these links get added to the SM's "light sampling" list and never get removed. This ties up outstanding MAD packet slots, causing the SM to become unresponsive for several seconds every time it reviews its light sampling list. I'm working on fixing these. I'll take care of the second problem (incorrectly getting added to the light sampling list) first. Is it possible this problem is related to the re-routing on port disable problem? Anyhow, if you have any specific comments about these issues, that would be great. Thanks, and have a great Fourth of July. -jeff > > -- Hal > >> Please let me know if you have any questions/issues with these. Thanks. >> >> -jeff -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi again Jeff, On 7/3/2013 12:20 PM, Jeff Becker wrote: > Hi Hal, > > I have some testing info about the second patch below. > > On 07/03/2013 03:23 AM, Hal Rosenstock wrote: >> HI Jeff, >> >> On 6/26/2013 5:24 PM, Jeff Becker wrote: >>> Hi Hal. At the OFA workshop, I mentioned that I've been working on some >>> modifications to opensm that we use at NASA. Following extensive testing >>> of these applied to opensm 3.3.13 (the version we run here), I have >>> ported these to top of tree opensm, and have tested them on a small >>> cluster. >> Thanks for getting this done! For future reference, patches should be >> sent as plain text as this makes it easier to comment. > > OK. So I just send the output of git-format-patch directly? It appears > to be formatted properly. >> >>> The first patch modifies the console logflush command to take "on" or >>> "off" as an argument for toggling. >> Thanks. Applied. >> >>> The second (more extensive) patch >>> adds a command line option to specify a file in which each line contains >>> a switch GUID/port pair to be ignored by opensm. The idea is to specify >>> this file when you start opensm (it can be empty), and add ports to >>> ignore (one per line for each end of a connection) to the file. At the >>> next heavy sweep (or HUP) the sm will reprogram the forwarding tables >>> without including the ignored links. We use this for replacing cables, >>> as well as for system expansion (adding new racks). >> I'll comment on this one later. > > Dale (cc'd) did some testing with my patch on Pleiades in preparation > for a system augmentation (new racks) happening soon. He found that the > SM correctly produces routes that do not use links marked to be ignored, > but when you then remove or disable the links, the SM re-routes the > fabric anyway and comes up with different routes than before. This > rerouting causes problems with existing connections. There also appears > to be a bookkeeping problem such that some of these links get added to > the SM's "light sampling" list and never get removed. This ties up > outstanding MAD packet slots, causing the SM to become unresponsive for > several seconds every time it reviews its light sampling list. Yes, this is one of several issues with using this approach. I plan on detailing these later as well as posting a slightly different approach for this but that may take a little longer... > I'm working on fixing these. I'll take care of the second problem > (incorrectly getting added to the light sampling list) first. Is it > possible this problem is related to the re-routing on port disable > problem? Anyhow, if you have any specific comments about these issues, > that would be great. > Thanks, and have a great Fourth of July. Thanks; you too! -- Hal > -jeff >> >> -- Hal >> >>> Please let me know if you have any questions/issues with these. Thanks. >>> >>> -jeff > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/opensm/osm_console.c b/opensm/osm_console.c index 0f80bdb..c065453 100644 --- a/opensm/osm_console.c +++ b/opensm/osm_console.c @@ -178,7 +178,7 @@ static void help_status(FILE * out, int detail) static void help_logflush(FILE * out, int detail) { - fprintf(out, "logflush -- flush the opensm.log file\n"); + fprintf(out, "logflush [on|off] -- toggle opensm.log file flushing\n"); } static void help_querylid(FILE * out, int detail) @@ -599,7 +599,21 @@ static void sweep_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) static void logflush_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) { - fflush(p_osm->log.out_port); + char *p_cmd; + + p_cmd = next_token(p_last); + if (!p_cmd || + (strcmp(p_cmd, "on") != 0 && strcmp(p_cmd, "off") != 0)) { + fprintf(out, "Invalid logflush command\n"); + help_sweep(out, 1); + } else { + if (strcmp(p_cmd, "on") == 0) { + p_osm->log.flush = TRUE; + fflush(p_osm->log.out_port); + } + else + p_osm->log.flush = FALSE; + } } static void querylid_parse(char **p_last, osm_opensm_t * p_osm, FILE * out)